jiaaro / pydub

Manipulate audio with a simple and easy high level interface
http://pydub.com
MIT License
8.82k stars 1.04k forks source link

A bug or a question: pyaudioop.py: Why is `_sample_count` divided by `sample_width`? #642

Open schittli opened 2 years ago

schittli commented 2 years ago

Hello

thank you very much for sharing your great work!, it surprises me how smart problems can be solved in Python.

I don't know if it's a bug or an understanding question and I ask because it looks strange:

The context

pyaudioop.py calculates the number of samples like this:

def _sample_count(cp, size):
    return len(cp) / size

The function doesn't document it, but it looks like size defines how many bytes a sample uses (usually 1, 2, or 4, I guess) because PyDub passes sample_width to it.

The question...

What confuses me is that Python seems to have a very flexible / smart int and that it doesn't matter if one stores one, two or more bytes in an int.

Therefore I would expect that _sample_count is simply the result of len(cp).

Why is the array length divided by the number of bytes of a sample in _sample_count?

Thanks a lot for any light 😃 , kind regards, Thomas

jiaaro commented 2 years ago

Ha, good question!

cp is a bytestring - so each index holds just one byte even though for most audio an individual sample is more than one byte long. In other words each sample in 16-bit audio, is two bytes and you have to read it from two indexes, like cp[2:4] and then unpack it into a numeric value with the struct module

schittli commented 2 years ago

Hello @jiaaro

thank you very much for your answer!, it's great when suddenly everything becomes clear.

It is very interesting that you have chosen a bytestring as the basic structure in pydub, because I have noticed that the pydub source code is always very lean and easy to understand if it processes the audio data - if one knows that there is a bytestring 😃

Regarding the bytestring: May I ask a question about understanding Python internals?

Could it be that the advantage of the bytestring is that the source code is simpler because Python provides great functions for processing this data structure.

And that the disadvantage is that the CPU has to work more because it has to use indexed byte accesses all the time instead of being able to work "with one stream of Int's" per audio channel?

I ask because I'm wondering if it's worth writing code to change the structure to e.g. an int array if the audio signal has to go through many complex calculation steps afterward.

Thanks a lot, kind regards, Thomas

akpeker commented 2 years ago

Regarding the data type being a bytestring, I am having problems with the following code:

    rate, wav_data = wavfile.read(str(wavpath))
    audio = AudioSegment(data=wav_data, frame_rate=rate, sample_width=2, channels=1)

(I know there's a from_file method of AudioSegment but this is part of a legacy code, so I load wav using wavfile) audio.raw_data turns out to be a numpy.ndarray (Array of int16). This seems to cause some problems down the way in certain operations. For example, I create white noise, which turns out to have raw_data type as bytes: wnoise = WhiteNoise(sample_rate=audio.frame_rate).to_audio_segment(duration=audio.duration_seconds*1000, volume=-40)
When I try to overlay it on the above audio, I get an error about length differences. I debugged and checked that the lengths are the same, up until the call to audioop.add, and they are the same. But - probably due to sample type differences - audioop thinks they are different sizes. au_wn = audio.overlay(wnoise) Another problem I saw is, seg._data and seg[0:]._data has different lengths in my test, a difference of 4 bytes. Might be something to be aware of.