Open peifang-lee opened 9 months ago
Yes, our code randomly selects 2-second audio segments for training purposes, while during validation, it retains the complete audio.
If you also slice the audio during validation, it doesn't affect the training process. However, the metrics you observe on the validation set at this point may not be the complete metrics.
Additionally, when making inferences, we recommend using the complete audio segment. If your GPU memory is insufficient, you can slice the long audio into segments for inference and then concatenate the obtained audio segments together. But we cannot guarantee that the final metrics will be exactly the same as before.
Thank you very much for releasing the code for your work.
I want to confirm that
split=true
indataset.py
means that when segment_size is defaulted to 32000, will it randomly select two seconds from the audio file for training?I set
split=true
when implement both train and validation, so it can be executed successfully. However,when I implementinference.py
, I find that the GPU memory of my computer is not enough for inference.Is there any way I can resolve this problem?