nrkno / sofie-atem-connection

Sofie ATEM Connection: A Part of the Sofie TV Studio Automation System
https://github.com/nrkno/Sofie-TV-automation/
MIT License
129 stars 36 forks source link

Investigate: Child thread as 'native' code #136

Open Julusian opened 1 year ago

Julusian commented 1 year ago

There is a child thread being used (worker thread via threadedclass), as the atem protocol is sensitive to timeouts, and needs to respond to messages every few milliseconds to ensure that the connection doesnt get dropped. We previously found it was easy to trigger drops doing simple things such as reading files, or some complex computations.

There are some bugs that have been open a long time for issues with how the child thread is implemented, I think it is worth re-evaluating how this child-thread is implemented and seeing if there is a better way of achieving it.

125 #106 #74 and more

The code in question is a single 400 line file https://github.com/nrkno/sofie-atem-connection/blob/master/src/lib/atemSocketChild.ts, and it doesn't get many changes as it only handles the 'transport level' of the protocol.

Ideas

1) Rewrite the child in wasm. I'm not sure if it is possible to create a udp socket in wasm, so this may not be achievable currently. This needs some research into whether that is possible, or if it is in a roadmap (eg planned for node 20 means it could be worth holding out for) What does this mean for threading? Can we achieve what we need? The requirement for buffers to be defined in the memory space of wasm should not be an issue for us here, but we should investigate how to handle this in a safe and performant way. Harder to maintain, and could hit limitations in the wasm runtime.

2) Rewrite the child as native code. (c++/rust) This could introduce issues for users if they need to build the native portion themselves. Harder to maintain. Easy to use threading Minimal overhead of type translation

ianshade commented 1 year ago

Rewrite the child in wasm.

Correct me if I'm wrong, but aren't threads in WASM also built on top of the platform features (Web Workers / Worker Threads)? Rewriting in WASM could maybe reduce some overhead of Node's event loop and all, but for a task as simple as listening for packets and replying, would there be any noticeable gain? I think what could help more is being in control of thread priority.

the atem protocol is sensitive to timeouts, and needs to respond to messages every few milliseconds to ensure that the connection doesnt get dropped

If that's a timeout on ATEM's side, did you consider asking Blackmagic Design to make it more forgiving?

Julusian commented 1 year ago

Correct me if I'm wrong, but aren't threads in WASM also built on top of the platform features (Web Workers / Worker Threads)?

It does sound like it, but it also looks like that will be handled inside the WASM, so should avoid some of the issues of using worker threads. Making worker_threads work cleanly for webpack, electron or the browser is not simple (threadedclass solves some of these, but webpack requires marking the whole of atem-connection as an external)
https://web.dev/webassembly-threads/

If that's a timeout on ATEM's side, did you consider asking Blackmagic Design to make it more forgiving?

I have not, but I don't think that will be entirely negate the need to use a thread. The timeout for resends is currently set at 60ms. If threading was disabled in this library, for sofie that timeout is most likely fine (assuming threading is enabled inside there). But if threading is disabled inside playout-gateway, then chances are that random timeouts will be hit. And I don't think there is a single number which would resolve this, it will depend on how many TSR devices there are, complexity of the timeline and performance of the machine