fix: prevent 'Lost lock mid-transfer' error with multiple transfers

ianshade commented 1 year ago

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) bug fix
What is the current behavior? (You can also link to an open issue here) Uploading multiple stills at once or one after the other sometimes leads to the Lost lock mid-transfer error https://github.com/nrkno/sofie-atem-connection/issues/143
What is the new behavior (if this is a feature change)? The idea is to make the locking more robust by:
- unlocking only when the queue is empty,
- aborting transfers when an unexpected unlock occurs, but not when a transfer gets queued between sending the unlocking LockStateCommand and receiving LockStateUpdateCommand back
Other information:

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 :tada:

Comparison is base (1951a9c) 86.26% compared to head (048a16a) 86.27%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #147 +/- ## ========================================== + Coverage 86.26% 86.27% +0.01% ========================================== Files 174 174 Lines 5591 5596 +5 Branches 928 928 ========================================== + Hits 4823 4828 +5 Misses 748 748 Partials 20 20 ``` | [Impacted Files](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno) | Coverage Δ | | |---|---|---| | [src/dataTransfer/dataTransferQueue.ts](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno#diff-c3JjL2RhdGFUcmFuc2Zlci9kYXRhVHJhbnNmZXJRdWV1ZS50cw==) | `86.46% <100.00%> (+0.10%)` | :arrow_up: | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Julusian commented 1 year ago

Yeah this is definitely behaving a lot better than before. It appears that previously it would sometimes get stuck for some reason..

There is still an easy to reproduce case of uploads getting stuck, but I expect that is not really related to this change.

Using:

const { Atem } = require('../dist')
const fs = require('fs')

const file = fs.readFileSync('./testframe.rgba')

if (process.argv.length < 3) {
    console.error('Expected ip address as parameter')
    process.exit(1)
}
const IP = process.argv[2]

const conn = new Atem({})

function uploadBatch() {
    const count = 8
    Promise.resolve().then(async () => {
        console.log('starting batch')

        const ps2 = []
        for (let i = 0; i < count; i++) {
            ps2.push(conn.clearMediaPoolStill(i))
        }
        await Promise.allSettled(ps2)
        console.log('batch cleaned')

        const ps = []
        for (let i = 0; i < count; i++) {
            ps.push(uploadNext(i))
        }

        await Promise.allSettled(ps)
        console.log('batch done')

        setTimeout(() => uploadBatch(), 500)
    })
}

async function uploadNext(myNumber) {
    const stuckTimeout = setTimeout(() => {
        console.log('')
        console.log('UPLOAD GOT STUCK')
        console.log('')
    }, 20000)

    const uploadStarted = Date.now()
    // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
    await conn.uploadStill(myNumber % conn.state.media.stillPool.length, file, 'contemplation..', '').then(
        (_res) => {
            console.log(`Uploaded still #${myNumber} in ${Date.now() - uploadStarted}ms at 1080p`)

            clearTimeout(stuckTimeout)
        },
        (e) => {
            console.log(`fail from #${myNumber}`, e)

            clearTimeout(stuckTimeout)
        }
    )
}

conn.on('error', console.log)
conn.on('disconnected', () => {
    console.log('Connection lost')
    process.exit(0)
})
conn.connect(IP).catch((e) => {
    console.error(e)
    process.exit(0)
})
conn.once('connected', () => {
    console.log(`connected`)
    uploadBatch()
})

You might need to increase the count to get it to hit the scenario. For this it is important to also have the atem software open (for me on another machine, should be fine on the same) It will upload the first batch fine, with the atem software waiting for the lock to be released (orange circle shows over the thumbnails)
The batch completes uploading, the atem software starts downloading, then the upload of the second batch begins, and fails due to not having the lock (Error: Unknown error 5) (if the download completes first this does not happen, tune the count to get them to overlap)

Trying the same script before this change, it fails to even upload the first batch, with the second image failing due to the lock lost error.

@ianshade I am happy to merge this as is, unless you want to look into this likely unrelated other locking issue first?

nrkno / sofie-atem-connection

fix: prevent 'Lost lock mid-transfer' error with multiple transfers #147

Codecov Report