ixobert / birds-generation

11 stars 1 forks source link

Preprocessing requirements are inconsistent with code #14

Open masonyoungblood opened 5 months ago

masonyoungblood commented 5 months ago

Thanks for developing such a great tool - I'm already getting really impressive performance out of the box!

I have a question about the preprocessing requirements, which are listed in README.md as:

Both generate_samples.py and interactive_app.py specify a sampling rate of 16384 Hz and audio length of 4 seconds in the core arguments. Are these the correct requirements, and is the length requirement up to 4 seconds or do shorter files need to be interpolated with silence to be exactly 4 seconds long?

ixobert commented 5 months ago

@masonyoungblood Thank you for your feedback on ECOGEN! Regarding your question about the preprocessing requirements: The specifications in the README (Mono, 22050 Hz, 5-second file size) were indeed used during the training phase of ECOGEN model. However, the parameters you see in generate_samples.py and interactive_app.py (16384 Hz, 4 seconds) also yield promising results in the generation phase. I'll update the README to clarify these details. Regarding the song length, I did try with longer songs than with no issue; but yes a minimum of 4 seconds should be used; and yes padding with silence is good option to reach the 4 seconds when needed.