Document the frame numbering used by AudioSample.frames(), cueFrame(), jumpFrame(), positionFrame(), .read() and .write()

kevinstadler commented 5 years ago

The AudioSample class (and its SoundFile subclass) have a .position() method which is meant to provide the user with information on the current position of playback (or last queued-to/paused-at position) in seconds. However, the calculation of the time based on the underlying frame position of the JSyn data queue is off (by about a factor of 2?) at the moment, which is why there is also no official documentation for this method. The frame-to-second-conversion needs to be looked into first, keeping in mind that whether an audio source is mono or stereo will have an effect on the frame rate that is not governed by the sample rate itself.

https://github.com/processing/processing-sound/blob/f8e184db6238a1f15f04b882e34b636141d36ef9/src/processing/sound/AudioSample.java#L455-L459

MLanghof commented 5 years ago

I was looking into switching from minim to this library because minim also has issues with reporting position (or song length) accurately, at least for real-world MP3 files. Being able to accurately navigate to any position (and knowing where playback is at) is crucial for me, so if you can get this right you'll gain a user :)

kevinstadler commented 5 years ago

Cool, thanks for the input. One open design question that has kind of held us back in implementing this feature is that position and cueing of samples can be measured along four different time scales (here listed together with currently implemented AudioSample/SoundFile functions that operate on that timescale):

number of frames [ 0 to number of frames ]
- frames()
- cueFrame(frame) (undocumented)
- write(startFrame, data[], startIndex, numFrames)
- read(index)
- read(startFrame, data, startIndex, numFrames)
relative position in percent of sample length [0 to 1 or 0 to 100]
- percent() (undocumented)
length of sample in seconds [ 0 to sample length in seconds ]
- duration()
- cue(time)
- jump(time)
- play(rate, pos, amp, add, cue)
length of sample in seconds, affected by the sample's current playback rate as set by rate() [ 0 to sample length divided by the current playback rate ]
- none yet, although the functions listed under point 3 could be adjusted to account for playback rate without breaking compatibility in the majority of cases.

Before definitely implementing position etc and adding it to the docs for good we need to make sure that we pick the time scale/unit(s) that is/are actually most relevant for users and, if we pick more than one scale for wider support, that the function naming is consistent between different methods using the same time scales.

Which time scale do you think would be the most useful one for you, and do you think there should be support for another time scale at all? I know seconds seems to be the most intuitive one, but cueing by seconds can often not achieve the same accuracy as cueing in the exact number of samples, plus there is the (as of yet unaddressed) question of how duration, position and cue should be reported for say a 1 second sound file that is currently being played back at half speed, and hence plays two seconds to play: is the duration of this sound file now actually two seconds? And if we cue to the 0.5s position, is this still half-way through the file (i.e. actually one second into current playback), or should we take playback rate into account and actually only cue 1/4th into the file, because that is how far you currently get into it with half a second of playback?

Once we've answered this question we should be able to make quick progress on properly implementing the functionality.

MLanghof commented 5 years ago

I have to slip in a quick terminology question because in the context I'm coming from things are called differently. Please correct me if any of these are wrong:

A "sample" is an audio clip (and not the numeric value of a waveform "sampled" at a given point in time).
A "frame" is one numeric value of a waveform at a given point in time.
You mentioned stereo in the initial text - do I understand correctly that for stereo samples, the frame data is actually interpreted as pairs? Does a stereo sample have twice as many frames for the same "frame rate" and duration?

Apologies if this is documented somewhere, I didn't have the time to dive into it yet.

kevinstadler commented 5 years ago

Yes exactly, that's also the terminology within the sound library, sorry if I mis-used 'sample' to mean 'frame' at some point.

And yes, whether a sample is mono or stereo is completely opaque to the user, so a 1 second stereo sample at 44k sample rate actually has 88.000 frames internally, but frames() will return 44.000. I want to say that the file looks and behaves just like a 44.000 frame long sample to the user, but I'm not actually sure how AudioSample's low-level read() and write() methods deal with stereo data, those two functions might be the exception in terms of exposing the true nature of stereo samples, they're only for advanced users though so that bit of inconsistency should be ok I think.

MLanghof commented 5 years ago

sorry if I mis-used 'sample' to mean 'frame' at some point.

I don't think you did, but it's mildly confusing to me because it's kinda backwards from the other low-level audio stuff I've dealt with. ;) I see where the terminology is coming from though, so no worries.

Which time scale do you think would be the most useful one for you

Milliseconds or seconds. I don't plan on changing playback speed so 3. and 4. would be identical for me. I don't think it would be a huge burden to offer both though (with different names to prevent confusion), right? One version would just call the other with the argument scaled by playback speed. I don't see an issue from the technical side at least.

do you think there should be support for another time scale at all?

Every function that deals with frames should obviously operate on frames and not seconds. I should note that the opaqueness of mono vs stereo conflicts with basically all of them. I mean, I'm not even sure if each stereo frame pair is together or if it's first all left frames and then all right frames in that array, and how I am supposed to pass the right numFrames to read or interpret what I get back is also a mystery... I think this side of the interface needs work to make intuitive sense, but it's not something I would currently need so there's no wishlist from my side.

And I don't think there's an issue with providing a percent() function either. Overall, as long as the potential for confusion is kept low through good naming, adding some more functions to the interface won't hurt.

kevinstadler commented 5 years ago

Ah good call, the lack of documentation about the behaviour of read() and write() for stereo samples is indeed an oversight... I just had a look at the code again and can tell you the following:

If you have a 1 second mono sample at 44k, frames() will obviously return 44.000. If you have a 1 second stereo sample at 44k, frames() will also return 44.000, because it has 44.000 frames, but each of the frames contains two data points (one for each channel).

Underlyingly, the JSyn audio sample class stores the data for the different channels in an interleaved format, i.e. as a sequence of left0, right0, left1, right1,... in an array that is twice as long as the number of frames of the sample. I haven't tested this yet, but when you call read() on a stereo sample, the startFrame and numFrames argument should be given in frames. However, you should expect a float array twice the length of the number of frames to be returned (and the same goes for write()). For reference, here is the JSyn function that is called: http://www.softsynth.com/jsyn/docs/javadocs/com/jsyn/data/FloatSample.html#read-int-float:A-int-int-

I'll keep you posted about the cueing/position function progress, might put a test build here on Sunday...

kevinstadler commented 5 years ago

I think I've sussed out most of the issues, could you please try out the following test release (simply replace the content of your Processing/libraries/sound/ folder with the contents of the zip) and see if the position() calculation etc all works for you? https://github.com/processing/processing-sound/releases/tag/v.2.2.0-test

I've also committed the full javadoc to the repository now, if you want to have a look at the as of yet undocumented functions for cueing/jumping to frames rather than seconds, have a look here: https://processing.github.io/processing-sound/index.html?processing/sound/package-summary.html

MLanghof commented 5 years ago

Taking a look now :)

kevinstadler commented 5 years ago

Position after calling cue() and cueFrame() are still messed up in the version I uploaded, they're fixed in a later one.

MLanghof commented 5 years ago

Luckily I didn't need cue.

I can report that the position calculations work correctly for all the cases I've tried, so that's great!

processing / processing-sound

Document the frame numbering used by AudioSample.frames(), cueFrame(), jumpFrame(), positionFrame(), .read() and .write() #28