henrymaas / AudioSlicer

Audio Slicer that uses silence detection to split .wav audio files into multiple .wav samples.
291 stars 59 forks source link

How to set the Silence_Threshold ? #13

Open alvar036 opened 10 months ago

alvar036 commented 10 months ago

Could someone explain how to set the right threshold for the silence?

It says "silence_threshold = 1e-4" in the script but i have no clue what that stands for?

Some sort of guide would be really helpful, for example:

1e-4 = -10db 1e-3 = -20db 1e-2 = -30db

etc. Thanks!

henrymaas commented 10 months ago

Could someone explain how to set the right threshold for the silence?

It depends on your audio track. For instance, if you're recording your voice in a quiet room with minimal environmental noise, you'll determine an optimal silence_threshold. However, this setting won't suit an audio interview conducted in a noisy street. It's essential to analyze your audio content to discern what constitutes "silence."

This silence_threshold signifies the minimal energy required to categorize an audio window as silent.You can try to visual analyze the audio spectrogram and try iterative experimentation (experimenting with different parameters to match your audio).

A brief explanation of slicing can be found in this thread without delving too deeply into the theoretical aspects: https://github.com/henrymaas/AudioSlicer/issues/7

I intend to elucidate this subject with illustrations and include it in the readme. I've observed numerous poeple attempting to train models using this code; maybe a brief explanation might help.

alvar036 commented 10 months ago

Thanks for explaining, but i still don't really understand it. And since the input is 1e-2 are we suppose to change the 1e AND the 2? or how does that work...

Wish we could make a visual interface for it where u can preview the audio waveform and set a threshold just like a Gate setting would working inside a DAW lol.

henrymaas commented 10 months ago

Certainly! The expression 1e-2 represents 0.01, while 1e-3 is equal to 0.001. These are representations of exponential numbers in mathematics. If you prefer, you can fine-tune it using decimal notations. For instance: setting silence_threshold = 0.01 , or silence_threshold = 0.0963 (whatever value that better suits your audio).

Notice that you don't need an exact number, but an aproximation for what should match your audio track. For example, if you record just your voice, in your room with any microphone, and say some phrases, using the default parameters, it should slice it when finds the periods of silence.

In simpler terms, the lower the value of Y axis, observed in a 2d audio spectrogram, the lower the energy present. The combination between the silence window and the silence threshold determines the duration of "silence."

I plan to elucidate this topic through a Jupyter Notebook with visual aids, likely by this weekend, and I'll share the details with you then.

alvar036 commented 10 months ago

that's awesome! thank you so much for the further explaining, and i will wait for you're notebook :)