tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Apache License 2.0
5.76k stars 1.18k forks source link

32-bit float WAV file is silently converted to 32-bit signed integer. #328

Closed slowglow closed 3 years ago

slowglow commented 3 years ago

I am not sure if this is an issue, I guess it depends on the internal workings of the library, so I'm asking. Is this intentional?

Basically DSP on a PC is more conveniently done in floating point, because one is free of the worry of integers overflowing here and there. So I had my intermediate files (from some other analysis) prepared in Audacity's 32-bit floating point format with a range (-1.0, +1.0). After importing by using audioBasicIO.read_audio_file, I end up with an array of int32.

The actual reading of the audio file is in read_audio_generic these lines:

audiofile = AudioSegment.from_file(input_file)
data = np.array([])
if audiofile.sample_width == 2:
    data = numpy.fromstring(audiofile._data, numpy.int16)
elif audiofile.sample_width == 4:
    data = numpy.fromstring(audiofile._data, numpy.int32)

I notice that there isn't a query as of the actual format of the data in the audio file. Am I missing something?

Tronic commented 3 years ago

Despite following this library out of interest for a while, I have to say I have no idea of its design principles. That being said, I agree that all audio processing should be done exclusively in 32 bit float nominally in -1.0 to 1.0 range. Reasoning:

Only legacy audio APIs and PCM audio files still use integer formats. If you are designing anything, even low level audio stuff like kernel driver, do not support multiple sample formats like all the old ones do, just use floats and convert integers to/from float32 as close to the hardware as possible. The system load comes from frequent polling, not because of the number format used nor because floats use twice the memory/cache/bandwidth.

slowglow commented 3 years ago

Very good points! Thank you!

Now, about the design principles of the library, I don't know either, and the documentation is scarce. In addition, the recent refactoring of the code broke (at least for me) some old programs using the internals of the library. More importantly, the recent code changes are not reflected in the documentation (the wiki).

Fortunately, it is an open source project (Great thanks to you Theodoros!) and in an open discussion a lot of issues can be ironed out. (By the way, where would be the appropriate place for having a discussion ?)

Now, I don't know if these classify as design principles, because they haven't been spelt out explicitly, but some of the points that I really like about the library are:

What I would like to see:

Indeed, if Theodoros can jump in and set out some design principles and contribution guidelines, it would be easier to grow a small base of regular contributors, I guess.

After getting side-tracked, I'm getting back to the original issue: If I change the import portion to import as float, can I expect the library to work? Or it will break it because all consecutive handling expects integers?