nrkno / sofie-atem-connection

Sofie ATEM Connection: A Part of the Sofie TV Studio Automation System
https://github.com/nrkno/Sofie-TV-automation/
MIT License
129 stars 36 forks source link

fix: prevent 'Lost lock mid-transfer' error with multiple transfers #147

Closed ianshade closed 1 year ago

ianshade commented 1 year ago
codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 :tada:

Comparison is base (1951a9c) 86.26% compared to head (048a16a) 86.27%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #147 +/- ## ========================================== + Coverage 86.26% 86.27% +0.01% ========================================== Files 174 174 Lines 5591 5596 +5 Branches 928 928 ========================================== + Hits 4823 4828 +5 Misses 748 748 Partials 20 20 ``` | [Impacted Files](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno) | Coverage Δ | | |---|---|---| | [src/dataTransfer/dataTransferQueue.ts](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno#diff-c3JjL2RhdGFUcmFuc2Zlci9kYXRhVHJhbnNmZXJRdWV1ZS50cw==) | `86.46% <100.00%> (+0.10%)` | :arrow_up: | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/nrkno/sofie-atem-connection/pull/147/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nrkno)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Julusian commented 1 year ago

Yeah this is definitely behaving a lot better than before. It appears that previously it would sometimes get stuck for some reason..

There is still an easy to reproduce case of uploads getting stuck, but I expect that is not really related to this change.

Using:

const { Atem } = require('../dist')
const fs = require('fs')

const file = fs.readFileSync('./testframe.rgba')

if (process.argv.length < 3) {
    console.error('Expected ip address as parameter')
    process.exit(1)
}
const IP = process.argv[2]

const conn = new Atem({})

function uploadBatch() {
    const count = 8
    Promise.resolve().then(async () => {
        console.log('starting batch')

        const ps2 = []
        for (let i = 0; i < count; i++) {
            ps2.push(conn.clearMediaPoolStill(i))
        }
        await Promise.allSettled(ps2)
        console.log('batch cleaned')

        const ps = []
        for (let i = 0; i < count; i++) {
            ps.push(uploadNext(i))
        }

        await Promise.allSettled(ps)
        console.log('batch done')

        setTimeout(() => uploadBatch(), 500)
    })
}

async function uploadNext(myNumber) {
    const stuckTimeout = setTimeout(() => {
        console.log('')
        console.log('UPLOAD GOT STUCK')
        console.log('')
    }, 20000)

    const uploadStarted = Date.now()
    // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
    await conn.uploadStill(myNumber % conn.state.media.stillPool.length, file, 'contemplation..', '').then(
        (_res) => {
            console.log(`Uploaded still #${myNumber} in ${Date.now() - uploadStarted}ms at 1080p`)

            clearTimeout(stuckTimeout)
        },
        (e) => {
            console.log(`fail from #${myNumber}`, e)

            clearTimeout(stuckTimeout)
        }
    )
}

conn.on('error', console.log)
conn.on('disconnected', () => {
    console.log('Connection lost')
    process.exit(0)
})
conn.connect(IP).catch((e) => {
    console.error(e)
    process.exit(0)
})
conn.once('connected', () => {
    console.log(`connected`)
    uploadBatch()
})

You might need to increase the count to get it to hit the scenario. For this it is important to also have the atem software open (for me on another machine, should be fine on the same) It will upload the first batch fine, with the atem software waiting for the lock to be released (orange circle shows over the thumbnails)
The batch completes uploading, the atem software starts downloading, then the upload of the second batch begins, and fails due to not having the lock (Error: Unknown error 5) (if the download completes first this does not happen, tune the count to get them to overlap)

Trying the same script before this change, it fails to even upload the first batch, with the second image failing due to the lock lost error.

@ianshade I am happy to merge this as is, unless you want to look into this likely unrelated other locking issue first?