incidentist / the_tuul

Make a decent karaoke video from any song in about 10 minutes.
https://the-tuul.com
17 stars 4 forks source link

Separation in the browser #39

Open incidentist opened 2 weeks ago

incidentist commented 2 weeks ago

Okay, we've got video creation in the browser. Can we do separation by loading models using onnxruntime-wasm? This will allow us to remove the costliest part of the server and make the project more sustainable, at the cost of (probably) increasing the time needed for video creation.

One option for this is to get the https://github.com/karaokenerds/python-audio-separator project to work in the browser using pyodide. However, currently it seems that pyodide doesn't work great with onnxruntime: https://github.com/pyodide/pyodide/issues/4220 (not surprising given how complicated ML libraries usually are).

beveradb commented 2 weeks ago

FYI there are better models which use less resources and don't require ONNXruntime now! E.g. the bs-roformer one or MDXC models - check out this post where I suggest some of the best ones I use currently! https://github.com/karaokenerds/python-audio-separator/discussions/82#discussioncomment-9874800

incidentist commented 1 week ago

@beveradb I was going to pick your brain about this, actually. I'm less worried about resource usage right now, and more worried about which model is the easiest to get up and running in the browser. It seems like onnx models are the most portable and would be the easiest to get running in the browser, so onnx is actually an advantage. ONNXruntime code looks very similar in JS and python, so I would crib a lot from python-audio-separator. It seems like most of the complication in running these models is in the audio prep for input, and turning model output into normal files. But I am real new to this.