Open xjf-303 opened 1 month ago
Hello, @jishengpeng thank you for the amazing work. May I ask several questions:
I was going through the paper and noticed that during the data preprocessing, audio is first cropped to a fixed length of 10 seconds and then randomly cropped again to obtain 3-second segments. I have a couple of questions regarding this process:
1.Could you explain the rationale behind first cropping the audio to 10 seconds and then performing another random crop to 3-second segments? How does this impact the model's performance or training?
2.Are there any overlaps between the cropped segments, or are they entirely distinct?
3.If possible, could you please share the code for this part of the data preprocessing pipeline?
Thank you for your time and consideration! Looking forward to your response.
Thank you for your attention.
Hello, @jishengpeng thank you for the amazing work. May I ask several questions:
I was going through the paper and noticed that during the data preprocessing, audio is first cropped to a fixed length of 10 seconds and then randomly cropped again to obtain 3-second segments. I have a couple of questions regarding this process:
1.Could you explain the rationale behind first cropping the audio to 10 seconds and then performing another random crop to 3-second segments? How does this impact the model's performance or training?
2.Are there any overlaps between the cropped segments, or are they entirely distinct?
3.If possible, could you please share the code for this part of the data preprocessing pipeline?
Thank you for your time and consideration! Looking forward to your response.