Closed slowglow closed 3 years ago
Despite following this library out of interest for a while, I have to say I have no idea of its design principles. That being said, I agree that all audio processing should be done exclusively in 32 bit float nominally in -1.0 to 1.0 range. Reasoning:
Only legacy audio APIs and PCM audio files still use integer formats. If you are designing anything, even low level audio stuff like kernel driver, do not support multiple sample formats like all the old ones do, just use floats and convert integers to/from float32 as close to the hardware as possible. The system load comes from frequent polling, not because of the number format used nor because floats use twice the memory/cache/bandwidth.
Very good points! Thank you!
Now, about the design principles of the library, I don't know either, and the documentation is scarce. In addition, the recent refactoring of the code broke (at least for me) some old programs using the internals of the library. More importantly, the recent code changes are not reflected in the documentation (the wiki).
Fortunately, it is an open source project (Great thanks to you Theodoros!) and in an open discussion a lot of issues can be ironed out. (By the way, where would be the appropriate place for having a discussion ?)
Now, I don't know if these classify as design principles, because they haven't been spelt out explicitly, but some of the points that I really like about the library are:
What I would like to see:
librosa
does some horrible things when importing audio: it resamples everything to 20500 Hz for the convenience of the developers, but is very open about it, why they do it and gives the users the option to opt out and import their data as they want. If they choose to do so, it is the users' responsibility to track the data format down the pipeline.Indeed, if Theodoros can jump in and set out some design principles and contribution guidelines, it would be easier to grow a small base of regular contributors, I guess.
After getting side-tracked, I'm getting back to the original issue:
If I change the import portion to import as float
, can I expect the library to work? Or it will break it because all consecutive handling expects integers?
I am not sure if this is an issue, I guess it depends on the internal workings of the library, so I'm asking. Is this intentional?
Basically DSP on a PC is more conveniently done in floating point, because one is free of the worry of integers overflowing here and there. So I had my intermediate files (from some other analysis) prepared in Audacity's 32-bit floating point format with a range (-1.0, +1.0). After importing by using
audioBasicIO.read_audio_file
, I end up with an array ofint32
.The actual reading of the audio file is in
read_audio_generic
these lines:I notice that there isn't a query as of the actual format of the data in the audio file. Am I missing something?