Closed microbit-carlos closed 4 weeks ago
Conclusions from the call last week to discuss this proposal:
AudioRecording
holds the data bufferbytearray
as the main buffer type, and AudioTracks
to contain the rate and act as a memoryview of the bufferAudioFrame
/AudioRecorder
from record_into()
, copyfrom()
or indexing, so that we don't play blank or old data
record_into()
can return an AudioTrack
with only the recorded dataAudioRecording
we don't have a way to specify a buffer size with time, e.g. foo = AudioRecording(duration=3000)
AudioRecording(duration, rate=7812)
constructor would match the current AudioFrame implementation, internally it allocates its own buffer
size
argument in bytes to AudioFrame
, with this proposal this is not neededAudioTrack
points to a buffer-like object, like AudioRecording
and it behaves like a memoryview with its own rate
AudioTrack(buffer, rate=7812)
bytearray
, e.g AudioTrack(bytearray(128))
AudioTrack
and AudioRecording
can be accessed and set via element, e.g. my_audio_recording[0] = my_audio_track[-1]
AudioTrack
can be sliced and returns a new AudioTrack
pointing to the same buffer
full_track = AudioTrack(bytearray(2000))
first_half = full_track[:len(full_track)]
second_half = full_track[len(full_track):1000]
AudioRecording
cannot be sliced, as it's not obvious that it would return an AudioTrack`.
sliced_track = AudioTrack(my_recording)[100:]
AudioRecording
deals with units of time, it should have a method to slice it with the same units and create an AudioTrack
my_recording = AudioRecording(duration=3000)
my_track = my_recording.track(start=1000, end=2000)
microphone.record
returns an AudioRecording
as their arguments match
my_recording = microphone.record(duration=1000, rate=11000)
microphone.record_into()
takes any writeable buffer-like object like AudioRecording
, AudioTrack
or bytearray
, records the data into its buffer and returns an AudioTrack
for the length of the recording
record_into(audio_track_5_seconds, wait=False); sleep(1000); stop_recording()
records the data inside the buffer from audio_track_5_seconds
and returns an AudioTrack
pointing to the same buffer, but is only 1 second long.AudioRecording
and maybe AudioTrack
? or would it be better to have a copyfrom()
method like AudioFrame
?
copyfrom(buffer)
method not taking a "start byte" argument, the main way to write with an offset would be to create an AudioTrack
first, e.g. AudioTrack(my_audio_recording)[128:].copyfrom(radio_packet)
AudioTrack
reject a non-writeable buffer type, like bytes
?
AudioFrame
, would it be better to use the default 11K sampling rate that CODAL and MakeCode have been using?@jaustin @dpgeorge thoughts and comments very welcomed, specially to the questions above.
With the above new proposal, the ways to construct something to record into are:
AudioRecording(duration, rate=7812)
AudioTrack(bytearray(size))
AudioTrack(AudioRecording(duration, rate=7812), rate=7812)
That seems to be a bit awkward, all these ways of constructing look different and it's not obvious which one to use when.
Would it be simpler instead to have a function that creates a bytearray with convenience arguments to specify duration? Eg:
audio.new_recording(*, size, duration, rate=7812) -> bytearray
That's still not great, because the rate
is lost when it returns the byetarray
, so you'd need to specify the rate again when creating the AudioTrack
.
Then maybe the function can return an AudioTrack
, eg:
def new_recording(*, size=None, duration=None, rate=7812):
if duration:
size = duration * rate // 1000
return AudioTrack(bytearray(size), rate=rate)
That way there's only one main way to create a new buffer, via this new_recording()
helper function.
It makes sense that indexing uses bytes as the units for the index value. But that means bytes become the default set of units. For example the constructor should then default to take bytes as a positional argument, eg new_recording(size, *, ...)
. And then slicing should also be in units of bytes.
Then something like AudioRecording.track(start, end)
may start to get confusing if start
and end
are measured in milliseconds.
As for slice assignment (eg track[:10] = bytearray(10)
): yes this is possible to implement and I think we should implement it. It allows a convenient way to copy data into a buffer.
- Is a "writeable buffer-like" object something that MicroPython can easily identify internally?
Yes, that's easy to do.
- Now that the new types are not expanding
AudioFrame
, would it be better to use the default 11K sampling rate that CODAL and MakeCode have been using?
Maybe... the issue is that audio.play()
defaults to 7812Hz because the original AudioFrame
doesn't have a rate associated with it.
Maybe... the issue is that
audio.play()
defaults to 7812Hz because the originalAudioFrame
doesn't have a rate associated with it.
But different channels in the pipeline have independent sampling rates, no? Or is everything set up to 7812Hz?
Edit: Ah, but this would likely use the same channel as AudioFrames, okay.
Okay, so we ultimate have three approaches to consider.
1) AudioRecording
can have its own buffer, or if sliced like a memoryview, it can contain a pointer to the buffer from the original source
2) AudioRecording
contains its own buffer and AudioTrack
points to an external buffer
3) An AudioTrack
points to an external buffer (like a bytearray
) and a "factory function" can be used to initialise it
I'll have a chat with the edu team next week to decide between these approaches.
AudioRecording
to hold buffer or pointerThe AudioRecording
constructor would have arguments using both time and byte units.
As slices would have to be in bytes, it makes sense for the first positional argument to be in that unit as well.
AudioRecording(size, *, duration, rate=7812)
e.g.
AudioRecording(10_000, rate=5_000) # 10K bytes to hold 2 seconds of sound
AudioRecording(duration=3_000) # 3 seconds
AudioRecording(size=10000, duration=3000) # Error: incompatible arguments provided
Slicing is in byes, e.g. my_audio_recording[1000:]
A function to slice in time units would need to be provided:
my_audio_recording.track(start_ms, end_ms)
record_into()
can return a "shorter" AudioRecording
pointing to the same buffer:
original_buffer = AudioRecording(duration=2_000)
exact_recording = microphone.record_into(original_buffer, wait=False)
sleep(1000)
microphone.stop_recording()
audio.play(exact_recording) # Plays 1 second of recorded audio
audio.play(original_buffer) # Plays 1 second of recorded audio followed by 1 blank second
Advantages:
Disadvantages:
AudioRecording
, which doesn't copy the buffer, it points to the buffer from the original AudioRecording
. This could be confusing if modifying one AudioRecording affects the other.AudioRecording
& AudioTrack
The AudioRecording
class contains the buffer internally, and AudioTrack
can be created to sliced. Or an AudioTrack
can be created from an AudioRecording
or other types of buffer.
AudioRecording(duration, rate=7812)
AudioTrack(buffer, rate=7812)
my_audio_recording = AudioRecording(1000) # Contains 1 second worth of data
my_audio_track = AudioTrack(my_audio_recording)[250:750] # Auditrack points to the buffer in my_audio_recording
my_track = my_audio_recording.track(start_ms=100, end_ms=200) # A 100ms track
To work in bytes instead of time, an AudioTrack can be created from a bytearray.
my_track = AudioTrack(bytearray(10_000))
for i in range(0, len(my_track), PACKET_SIZE):
radio.send(my_track[i:i+PACKER_SIZE])
When using record_into
it would save the data into an AudioRecording
and return an AudioTrack
with the exact length of the recording.
my_recording = AudioRecording(duration=2_000)
my_track = microphone.record_into(my_recording, wait=False)
sleep(1000)
microphone.stop_recording()
audio.play(my_track) # Plays 1 second of recorded audio
audio.play(my_recording) # Plays 1 second of recorded audio followed by 1 blank second
Advantages:
Disadvantages:
AudioRecording(duration, rate=7812)
AudioTrack(bytearray(size))
AudioTrack(AudioRecording(duration, rate=7812), rate=7812)
AudioRecording
+ factory functionAs having multiple ways to initialise an AudioTrack and its buffer can be confusing, we could have a single class that behaves like an AudioTrack
(could be called AudioRecording, but for clarity in this section is still called AudioTrack) and we can provide a factory function to create its buffer:
my_track = audio.new_track(duration=3_000)
Where would have a very simple implementation:
def new_track(*, size=None, duration=None, rate=7812):
if size and duration:
raise Exception("Incompatible arguments")
if duration:
size = duration * rate // 1000
return AudioTrack(bytearray(size), rate=rate)
So, in this case microphone.record_into()
takes and returns an AudioTrack.
And microphone.record()
would return an AudioTrack from a buffer created by the microphone function.
Advantages:
AudioTrack
only unit is bytes, so less confusion in that areaDisadvantages:
After discussing it with the edu team I think we should go with the AudioRecording
+ AudioTrack
approach. They are all good enough options, and while with this option there might be multiple ways to initialise a class to hold audio data, the cleaner API is worth it.
I'll update the docs PR, but @dpgeorge feel free to start the implementation when you have a chance.
OK, I've now implemented the new AudioTrack
/ AudioRecording
API. I've tested it but I can imagine there are some things that still need a bit of work.
This can be closed as completed in https://github.com/microbit-foundation/micropython-microbit-v2/pull/163
A lot of the open issues discussing enhancements of some of the API using AudioFrames could be resolved by using
memoryview
, howevermemoryview
cannot be played or recorded into because the relevant functions also need a rate attached to it.So if
AudioFrame
behaved a bit more likememoryview
, specifically when using slices, we could easily achieve a lot of the discussed functionality without additional unnecessary memory copies.Proposal: AudioFrames to be able to reference external buffers
AudioFrame
created from the constructor, or frommicrophone.record()
, would allocate their own buffer.AudioFrame
would also contain a "start" and "end" markers (or a buffer pointer and a length)memoryview
can point to other buffersAudioFrames
harder to understand and it's also not clear how much they can be moved. E.g. as there isn't a way to retrieve the real start and end of the referenced buffer, so these markers could only be used to reduce theAudioFrame
and not increase itAudioFrame.copy()
does a make a copy of the bufferDisadvantages
It might come as a surprised to a user that modifying a slice can change the original
AudioFrame
:Alternative
We could have a new class that is essential
memoryview
, which can also point to the rate of the originalAudioFrame
. This has the advantage that it makes a lot more obvious that are not dealing with a newAudioFrame
with its own copy of the data.Because getting a different class instance from a slice is a bit weird, rather than use slices we could use a method call. For example:
AudioFrame nomenclature
As we consider an "AudioTrack" being created from an AudioFrame, it's becoming more obvious that the AudioFrame name doesn't quite fit the current implementation. As a "frame" is generally small, deriving a "track" out of it doesn't make that much sense. The original intent of grouping multiple frames to create longer audio makes more sense than the current implementation of having frames taking several seconds.
Perhaps should we leave AudioFrame as it was implemented in V1, and rename the current expanded version to something along the lines of "AudioRecording" (could be something different, maybe not directly related to recording from the microphone), to which it would make more sense that it could have multiple "tracks".
Use cases
Copying multiple chunks of data into a single AudioFrame
There isn't slice assignment on
AudioFrame
,bytearray
, normemoryview
, andAudioFrame.copyfrom()
always copies data from the beginning of the AudioFrame. So, we have to go byte by byte:Before
After, new AudioFrame
This allows us to copy full chunks in one operation, instead of byte by byte.
After, slice assignment
Slice assignment might not be that obvious to novice programmers, but could be an even more succinct option.
Break down AudioFrame into smaller chunks
The best method for this currently is to use a
memoryview
(could also create abytes
object from theAudioFrame
and slice it, butmemoryview
saves copying the data):Before
After
With this approach we could use slices directly on the
AudioFrame
without creating unnecessary copies:Playing an AudioFrame from an arbitrary position
As a
memoryview
cannot be played directly, and anAudioFrame
is always played from the beginning, we need to create a new AudioFrame that starts from the point we'd like to playback.Before
After
Playing just a portion of the AudioFrame
This works fine in the current implementation, the only thing is that the most common way of doing this would be with
sleep()
(instead oftime.ticks_ms()
) to measure time, and the CODALuBit.sleep()
has a resolution of 4ms + any extra overhead from calling functions. So it might not be extremely accurate.Before
After
This should accurately play for the specified time