microbit-foundation / micropython-microbit-v2

Temporary home for MicroPython for micro:bit v2 as we stablise it before pushing upstream
MIT License
44 stars 25 forks source link

`AudioFrame` proposal: Reference external buffer #205

Closed microbit-carlos closed 4 weeks ago

microbit-carlos commented 5 months ago

A lot of the open issues discussing enhancements of some of the API using AudioFrames could be resolved by using memoryview, however memoryview cannot be played or recorded into because the relevant functions also need a rate attached to it.

So if AudioFrame behaved a bit more like memoryview, specifically when using slices, we could easily achieve a lot of the discussed functionality without additional unnecessary memory copies.

Proposal: AudioFrames to be able to reference external buffers

Disadvantages

It might come as a surprised to a user that modifying a slice can change the original AudioFrame:

original_af = audio.AudioFrame(size=1024)
new_af = original_af[512:]
new_af[0] = 255    # This also change original_af[512] to 255

Alternative

We could have a new class that is essential memoryview, which can also point to the rate of the original AudioFrame. This has the advantage that it makes a lot more obvious that are not dealing with a new AudioFrame with its own copy of the data.

Because getting a different class instance from a slice is a bit weird, rather than use slices we could use a method call. For example:

audio_frame = audio.AudioFrame(size=1000)
first_half = audio_frame.track(end=500)
second_half = audio_frame.track(start=500)
middle_half = audio_frame.track(start=250, end=750)

AudioFrame nomenclature

As we consider an "AudioTrack" being created from an AudioFrame, it's becoming more obvious that the AudioFrame name doesn't quite fit the current implementation. As a "frame" is generally small, deriving a "track" out of it doesn't make that much sense. The original intent of grouping multiple frames to create longer audio makes more sense than the current implementation of having frames taking several seconds.

Perhaps should we leave AudioFrame as it was implemented in V1, and rename the current expanded version to something along the lines of "AudioRecording" (could be something different, maybe not directly related to recording from the microphone), to which it would make more sense that it could have multiple "tracks".

Use cases

Copying multiple chunks of data into a single AudioFrame

There isn't slice assignment on AudioFrame, bytearray, nor memoryview, and AudioFrame.copyfrom() always copies data from the beginning of the AudioFrame. So, we have to go byte by byte:

Before

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    for byte in chunk:
        af[i] = byte
        i += 1

After, new AudioFrame

This allows us to copy full chunks in one operation, instead of byte by byte.

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    small_af = af[i:]
    small_af.copyfrom(chunk)
    i += len(chunk)

After, slice assignment

Slice assignment might not be that obvious to novice programmers, but could be an even more succinct option.

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    af[i:i+len(buffer)] = chunk
    i += len(chunk)

Break down AudioFrame into smaller chunks

The best method for this currently is to use a memoryview (could also create a bytes object from the AudioFrame and slice it, but memoryview saves copying the data):

Before

af = audio.AudioFrame(duration=1000)
m = memoryview(af)
for i in range(0, len(m), PACKET_SIZE):
    radio.send_bytes(m[i:i+PACKER_SIZE])

After

With this approach we could use slices directly on the AudioFrame without creating unnecessary copies:

af = audio.AudioFrame(duration=1000)
for i in range(0, len(af), PACKET_SIZE):
    radio.send(af[i:i+PACKER_SIZE])

Playing an AudioFrame from an arbitrary position

As a memoryview cannot be played directly, and an AudioFrame is always played from the beginning, we need to create a new AudioFrame that starts from the point we'd like to playback.

Before

original_af = microphone.record(1000)
memoryview_af = memoryview(af)
shorter_af = audio.AudioFrame(duration=500)
shorter_af.copyfrom(memoryview_af[500:])
audio.play(shorter_af)

After

original_af = microphone.record(1000)
audio.play(shorter_af[500:])

Playing just a portion of the AudioFrame

This works fine in the current implementation, the only thing is that the most common way of doing this would be with sleep() (instead of time.ticks_ms()) to measure time, and the CODAL uBit.sleep() has a resolution of 4ms + any extra overhead from calling functions. So it might not be extremely accurate.

Before

af = microphone.record(2000)
audio.play(af, wait=False)
sleep(1000)
audio.stop()

After

This should accurately play for the specified time

af = microphone.record(2000)
audio.play(af[:len(af)/2])
microbit-carlos commented 4 months ago

Conclusions from the call last week to discuss this proposal:

Updated proposal

Open questions

@jaustin @dpgeorge thoughts and comments very welcomed, specially to the questions above.

dpgeorge commented 3 months ago

Construction

With the above new proposal, the ways to construct something to record into are:

AudioRecording(duration, rate=7812)
AudioTrack(bytearray(size))
AudioTrack(AudioRecording(duration, rate=7812), rate=7812)

That seems to be a bit awkward, all these ways of constructing look different and it's not obvious which one to use when.

Would it be simpler instead to have a function that creates a bytearray with convenience arguments to specify duration? Eg:

audio.new_recording(*, size, duration, rate=7812) -> bytearray

That's still not great, because the rate is lost when it returns the byetarray, so you'd need to specify the rate again when creating the AudioTrack.

Then maybe the function can return an AudioTrack, eg:

def new_recording(*, size=None, duration=None, rate=7812):
    if duration:
        size = duration * rate // 1000
    return AudioTrack(bytearray(size), rate=rate)

That way there's only one main way to create a new buffer, via this new_recording() helper function.

Slicing and Indexing

It makes sense that indexing uses bytes as the units for the index value. But that means bytes become the default set of units. For example the constructor should then default to take bytes as a positional argument, eg new_recording(size, *, ...). And then slicing should also be in units of bytes.

Then something like AudioRecording.track(start, end) may start to get confusing if start and end are measured in milliseconds.

As for slice assignment (eg track[:10] = bytearray(10)): yes this is possible to implement and I think we should implement it. It allows a convenient way to copy data into a buffer.

dpgeorge commented 3 months ago
  • Is a "writeable buffer-like" object something that MicroPython can easily identify internally?

Yes, that's easy to do.

  • Now that the new types are not expanding AudioFrame, would it be better to use the default 11K sampling rate that CODAL and MakeCode have been using?

Maybe... the issue is that audio.play() defaults to 7812Hz because the original AudioFrame doesn't have a rate associated with it.

microbit-carlos commented 3 months ago

Maybe... the issue is that audio.play() defaults to 7812Hz because the original AudioFrame doesn't have a rate associated with it.

But different channels in the pipeline have independent sampling rates, no? Or is everything set up to 7812Hz?

Edit: Ah, but this would likely use the same channel as AudioFrames, okay.

microbit-carlos commented 3 months ago

Okay, so we ultimate have three approaches to consider.

1) AudioRecording can have its own buffer, or if sliced like a memoryview, it can contain a pointer to the buffer from the original source 2) AudioRecording contains its own buffer and AudioTrack points to an external buffer 3) An AudioTrack points to an external buffer (like a bytearray) and a "factory function" can be used to initialise it

I'll have a chat with the edu team next week to decide between these approaches.

AudioRecording to hold buffer or pointer

The AudioRecording constructor would have arguments using both time and byte units. As slices would have to be in bytes, it makes sense for the first positional argument to be in that unit as well.

AudioRecording(size, *, duration, rate=7812)

e.g.

AudioRecording(10_000, rate=5_000)        # 10K bytes to hold 2 seconds of sound
AudioRecording(duration=3_000)            # 3 seconds
AudioRecording(size=10000, duration=3000) # Error: incompatible arguments provided

Slicing is in byes, e.g. my_audio_recording[1000:]

A function to slice in time units would need to be provided:

my_audio_recording.track(start_ms, end_ms)

record_into() can return a "shorter" AudioRecording pointing to the same buffer:

original_buffer = AudioRecording(duration=2_000)
exact_recording = microphone.record_into(original_buffer, wait=False)
sleep(1000)
microphone.stop_recording()
audio.play(exact_recording)   # Plays 1 second of recorded audio
audio.play(original_buffer)   # Plays 1 second of recorded audio followed by 1 blank second

Advantages:

Disadvantages:

AudioRecording & AudioTrack

The AudioRecording class contains the buffer internally, and AudioTrack can be created to sliced. Or an AudioTrack can be created from an AudioRecording or other types of buffer.

AudioRecording(duration, rate=7812)
AudioTrack(buffer, rate=7812)
my_audio_recording = AudioRecording(1000)                 # Contains 1 second worth of data
my_audio_track = AudioTrack(my_audio_recording)[250:750]  # Auditrack points to the buffer in my_audio_recording
my_track = my_audio_recording.track(start_ms=100, end_ms=200) # A 100ms track

To work in bytes instead of time, an AudioTrack can be created from a bytearray.

my_track = AudioTrack(bytearray(10_000))
for i in range(0, len(my_track), PACKET_SIZE):
    radio.send(my_track[i:i+PACKER_SIZE])

When using record_into it would save the data into an AudioRecording and return an AudioTrack with the exact length of the recording.

my_recording = AudioRecording(duration=2_000)
my_track = microphone.record_into(my_recording, wait=False)
sleep(1000)
microphone.stop_recording()
audio.play(my_track)        # Plays 1 second of recorded audio
audio.play(my_recording)    # Plays 1 second of recorded audio followed by 1 blank second

Advantages:

Disadvantages:

AudioRecording + factory function

As having multiple ways to initialise an AudioTrack and its buffer can be confusing, we could have a single class that behaves like an AudioTrack (could be called AudioRecording, but for clarity in this section is still called AudioTrack) and we can provide a factory function to create its buffer:

my_track = audio.new_track(duration=3_000)

Where would have a very simple implementation:

def new_track(*, size=None, duration=None, rate=7812):
    if size and duration:
        raise Exception("Incompatible arguments")
    if duration:
        size = duration * rate // 1000
    return AudioTrack(bytearray(size), rate=rate)

So, in this case microphone.record_into() takes and returns an AudioTrack. And microphone.record() would return an AudioTrack from a buffer created by the microphone function.

Advantages:

Disadvantages:

microbit-carlos commented 2 months ago

After discussing it with the edu team I think we should go with the AudioRecording + AudioTrack approach. They are all good enough options, and while with this option there might be multiple ways to initialise a class to hold audio data, the cleaner API is worth it.

I'll update the docs PR, but @dpgeorge feel free to start the implementation when you have a chance.

dpgeorge commented 2 months ago

OK, I've now implemented the new AudioTrack / AudioRecording API. I've tested it but I can imagine there are some things that still need a bit of work.

microbit-carlos commented 4 weeks ago

This can be closed as completed in https://github.com/microbit-foundation/micropython-microbit-v2/pull/163