Open DeepanshCUHK opened 6 months ago
The default functionality of AUSoundIsolation
will isolate background noise using a neural network shipping with macOS (specifically aufx-vois-appl-nnet-vi-v0.plist
within AudioDSP.component)! If you comment out the audio unit properties setting the paths to the custom music-oriented model (plist/base/dereverb), this should be used by default. You may want to tinker some with the denoising/tuning properties, or remove them entirely.
(However, in my experience, it's not good at fully removing loud background music. This same audio unit (or something very similar?) appears to be used for the "Voice Isolation" microphone setting, and that itself isn't always the best :)
Got it, Thanks for your reply! The quality might not be the best, but I love that it works with 0 latency. I have tried other vocal-remover solutions like librosa (Python library) and multiple deep learning solutions, where a 3 min. song can take upto 30 seconds to process depending on the CPU/GPU. (But maybe there are other better solutions I don't know of?)
So, apparently while testing it is crashing on my iPhone 11 because it cannot find the file "aufx-nnet-appl.plist" but working on my iPhone 12 Pro (Both iPhones on iOS 17+). I don't know why that is happening because Apple says that voice isolation should work on Xr and later devices (https://support.apple.com/en-in/101993#:~:text=You%20can%20use%20Voice%20Isolation,iPad%20Pro%2011%2Dinch%20models). Maybe they are using a different audio unit for this.
Glad to hear this tool has helped :) unfortunately not familiar with other solutions, but wishing you best of luck with the search!
Both your iPhone 11 and 12 Pro should work! I've pushed some commits recently regarding model discovery under iOS 17 as the model location changed (especially 72a8734bb4464a63a64aa8b5d22dd6b04719c839 and 09839ccc295f904ce90600e57761ad4be7258b24) - with these pulled, are you able to run as expected? If not, could you please send the logging of what paths it attempted to search for the model?
Great! 72a8734 fixed it.
So while testing it more, it is working for most files but I encountered an error where the plist file threw an exception for a file in .m4a format (karaoke recording with background music + vocals). Was wondering if you have any idea why this could happen? Maybe because the recording quality is bad, and background or vocals are perceived as noise?
Assertion failed: ((numInternalIOChannels == 1 || (numInternalIOChannels == numIOChannels && internalBatchSize == 1)) && "internal format must be one channel or the same number of IO channels (when internal batch size = 1)"), function CreateProcessingGraph, file SoundIsolationGraphAdapter.cpp, line 335
Apologies for taking a little while to get back! I'm able to reproduce this with any audio that has a single channel, but unfortunately am not familiar enough to determine why.
Ok np. Thanks for all your help :) I am really enjoying this project!!
Got it, Thanks for your reply! The quality might not be the best, but I love that it works with 0 latency. I have tried other vocal-remover solutions like librosa (Python library) and multiple deep learning solutions, where a 3 min. song can take upto 30 seconds to process depending on the CPU/GPU. (But maybe there are other better solutions I don't know of?)
So, apparently while testing it is crashing on my iPhone 11 because it cannot find the file "aufx-nnet-appl.plist" but working on my iPhone 12 Pro (Both iPhones on iOS 17+). I don't know why that is happening because Apple says that voice isolation should work on Xr and later devices (https://support.apple.com/en-in/101993#:~:text=You%20can%20use%20Voice%20Isolation,iPad%20Pro%2011%2Dinch%20models). Maybe they are using a different audio unit for this.
Thank you for your time and effort. I would like to ask you something I don't understand. What exactly does the file and AudioDSP in the following sentence refer to? I've been doing some work on that lately.
specifically aufx-vois-appl-nnet-vi-v0.plist within AudioDSP.component
Does anyone know if this can be used to isolate the background instrumental music (instead of vocals)? I tried multiplying attenuation by -1 but that doesn't work.