Collaboration for asynchronous file access?

h4yn0nnym0u5e commented 1 year ago

Hi there

Don't Panic ... this is not really an issue as such!

positionhigh aka Mark PMed me on the PJRC forum with regard to my buffered SD playback code (https://github.com/h4yn0nnym0u5e/Audio/tree/feature/buffered-SD). He's keen to get a library which combines my multi-file playback with your variable-rate capability, for the MicroDexed project he's involved with. I gather you've worked with them to some extent in the past.

I must confess to having looked at your code previously and being utterly daunted! As an old-school C programmer (tends to be the language of choice for the day job in embedded medical products), I'm not well-versed in The Way of C++... but I had another look, and can just barely follow the outlines of what's going on.

I think my effort has been reasonably successful (though I've not had a huge amount of feedback or bug reports), so I'm fairly confident my approach of firing an Event inside the audio update, and re-loading buffers by servicing the event during "foreground" code (loop() exit, delay() or yield()) is OK.

If this sounds as if it might be of interest, do let me know and we can discuss the best approach. No offence if not, I'm sure you've got plenty of projects on hand!

best regards

Jonathan

newdigate commented 1 year ago

Hi Jonathan

The core of the resampling is really simple. Most of the complexity is in the looping, reading, and quadratic interpolation. Consider these two cases (with no anti-aliasing/interpolation):

Half rate playback: Every incoming sample is duplicated exactly once.
Double rate playback: Every second incoming sample is emitted.

The trick is to find the algorithm which given an arbitrary rate R1 will produce these two cases.... You can use an accumulator. Each sample cycle, the accumulator (A1) is incremented by R1. When the accumulator reaches or passes a whole number, the whole number is used to increase the incoming sample position, P1. the accumulator is reduced by the whole number, making it always less than 1. The sample returned for each cycle will be the incoming sample at P1.

so consider a sequence S1 = (0,1,2,3,4):

if the playback rate is 0.5: starting at P1 = 0, R1 = 0.5, A1 = 0.0; first cycle returns S1[P1] = S1[0] = 0; increments accumulator by R1: P1 = 0, R1 = 0.5, A1 = 0.5; second cycle returns S1[P1] = S1[0] = 0; increments accumulator by R1: P1 = 0, R1 = 0.5, A1 = 1.0; because A1 >= 1.0, P1 += 1.0; A1 -= 1.0; third cycle returns S1[P1] = S1[1] = 1;

newdigate commented 1 year ago

this is the crux of the resampling (casting _remainder from float to int is equivalent to truncating the fraction part): _remainder = accumulator A1, _playbackRate = R1, _bufferPosition = P1. Effectively it does the process above...

https://github.com/newdigate/teensy-variable-playback/blob/master/src/ResamplingReader.h#L343

 _remainder += _playbackRate;
auto delta = static_cast<signed int>(_remainder);
_remainder -= static_cast<double>(delta);
_bufferPosition +=  (delta * _numChannels);

h4yn0nnym0u5e commented 1 year ago

I'm not so concerned about the resampling, I'm just assuming you've got that right!

The main thing for making the source data available to the resampler in a timely manner is shifting from on-demand file reads to pre-loading data before it's going to be required, in chunks large enough to be efficient. With multiple tracks and variable filesystem latency, at 1x playback I've found pre-loading about 8k samples gives enough leeway for a reasonably large track count. Obviously that can take a fair bit of RAM, but PSRAM is very usable for the pre-load buffers. The evidence from others' testing seems to be that around 4kbytes is a fairly sweet spot for SD read sizes: my 4x bigger pre-loads are primarily to allow for extra latency when multiple files are in use. For faster playback rates even bigger buffers might be needed, but they're under programmer control so it's not a decision that the library writer needs to make.

As far as I can tell, you've implemented file random access by emulating a big array and indexing into it, which causes filesystem reads if necessary. The trick would therefore be to change just that element, so that array accesses (within the update() function) can always be satisfied from existing buffered data, but will also trigger a buffer load as soon as there's enough space available to make it worthwhile. At this point I get a bit stuck reading your code and figuring out how to nail my async buffer refill into it...

Do you think this is a sufficiently worthwhile improvement that you'd be happy to give a bit of guidance along the way so it fits with your aims/needs, and pull it in to your repo once complete (assuming it's functional)? I'm quite interested to see if I can make it work, but not enough to spend ages generating and maintaining a fork which might have a pretty limited user base.

newdigate commented 1 year ago

I'd be happy to put reference to your repo in the readme file, but I think it'd probably be better to separate this async loading strategy into a repo/branch of its own as I don't think it suits all use-cases, but certainly worth while trying this technique if it improves performance/stability.

I recently introduced the indexable file class to abstract/unify the interpolation code with lots of random access to nearby samples (Much easier and more portable to use indexers [ ] - so the interpolation code file access and array access would be using the same code - sacrificing efficiency for less repeated code). I think one could effectively do the same by storing variables for the last x number of samples. Earlier versions have different code per class to do interpolation. It might be interesting to see how I did it before the indexable file strategy...

There is a good argument for not reading from the SD card during the audio interrupt process() call, and I think its a great idea to try and get this loading to happen before the interrupt has been called, but I would treat it as experimental. During the audio callback interrupt, (i.e. process() method), if the buffer has less than a threshold of data remaining, it would mark a flag, and then during the next loop() method call, a new buffer would be recycled/allocated and read from the sd card.

Unfortunately I don't have much appetite/incentive to write teensy code at the moment, mostly just focusing on paying bills lately. But I'm more than willing to discuss solution, ideas, etc... so please feel free to ask any questions...

newdigate commented 1 year ago

the issue as i’m understanding it is that at the moment the read cycle occurs during the process method inside the interrupt. i am trying to get by head around how to get the buffer populated in the loop cycle instead. the minor complication is that the buffer will be either be forward or backward. if we assume the playback rate is constant during the processing of an audio buffer it simplifies it a bit. when the threshold remaining buffer level is reached, it would signal the loop method to load the next or previous buffer depending on if rate is positive or negative.

i think this is doable. but would need tweaking. also the issue is that sd cards are very different. the performance of reading only ever decreases with write cycles. but certainly there are sweet spots as you say.

If this would solve the issue of reading and writing to the same file on sd card I would be interested. but sadly i don’t think this solves that issue.

newdigate commented 1 year ago

just thinking aloud here: my slight hesitance is that there is no guarantee the loop() method will reach the code to load the next buffer before the audio interrupt processes the next buffer, depending on the load inside loop(). possibly creating a latency and/or need for prioritised task scheduling in your loop(), it may not happen if the sweet spot is sweet enough. but this might change drastically between sd cards.

there is a good discussion on sd card performance variation here: https://github.com/greiman/SdFat/issues/324

h4yn0nnym0u5e commented 1 year ago

OK, thanks for that. I always develop on branches anyway, never understood why people don't use them as a matter of routine...

I'll consider tinkering with this, and come back to you if I get stuck or have something that I think may be worthy of consideration. Possibly not soon, plenty of other stuff to do.

Good luck with paying the bills!

The way I've made this work is to use Paul's EventResponder class; it can be set up to trigger an event inside the update(), and the responder is configured to execute the file read code within yield(). As yield() is called whenever loop() exits, or delay() is called, or explicitly, it "just" requires discipline on the programmer's part to ensure this happens "often enough". Effectively it builds polling into the fabric of the application; one could do it with an explicit call, like AudioPlayResmp::fillBuffers(), but EventResponder does that for you.

... trying to keep up here! ...

Yes, it has to be often enough, which is why my buffers are 8ksamples long, so loop() can take 64 updates before stuff fails.

h4yn0nnym0u5e commented 1 year ago

As it stands my code only ever reads files forwards, but it's a trivial change to do a seek() if playing in reverse.

The programmer will have to allocate enough buffer for the maximum expected playback rate - as noted above, PSRAM is useful for huge buffers / multiple tracks.

newdigate commented 1 year ago

:) If the teensy were paying the bills I'd be working 60 hours a week :) Unfortunately the things that pay the bills are exceedingly boring :(

newdigate commented 1 year ago

if you’re using linux or mac and not scared to use command-line-interface i have some tools that make developing/debugging/ testing much easier. let me know if you’re interested. ( i developed, debugged and tested all this code on x86 with libsoundio audio integration and folders as mock sd cards, using visual studio code. and building my own teensy build process using cmake - to switch between build x86 and arm-eabi-none binaries)

h4yn0nnym0u5e commented 1 year ago

I'm using Win10 x64 and very basic tools - Arduino 1.x IDE and NotePad++. I really should spin up a Linux box or at least a VM, but I seem to be managing so far

newdigate commented 1 year ago

i wanted to port the build system to windows recently, because the people who need it use windows. but my latest version of windows was 7. turns out to be useless. no visual studio code for windows 7 or node js. the hardware was more than necessary. unfortunately a loss for everyone. one day it’ll happen. in the meantime i make the build system into a package manager for teensy.

h4yn0nnym0u5e commented 1 year ago

Windows 10 free upgrade still works, apparently ... I probably shouldn't post a link, but Google Is Your Friend (sort of ... I expect everyone's got a friend who rootles through their drawers and makes off with stuff they'll never miss). Disclaimer: I bought a copy when I built my PC

newdigate commented 1 year ago

Ah - didn't know about that!... And google was happy to give all the details! Brilliant. Thanks 👍

MattKuebrich commented 1 year ago

I'd LOVE to see a combination of both your libraries. I'm currently using teensy-variable-playback for a project, but would also like the ability of the buffered library to play / record multiple files simultaneously. The teensy-variable-playback library also has looping modes, along with the ability to set the loop start/finish, which I think the buffered library lacks.

I would be more than happy to test out any progress on this, in whatever shape it takes.

positionhigh commented 1 year ago

The teensy-variable-playback library also has looping modes, along with the ability to set the loop start/finish, which I think the buffered library lacks.

I think the buffered library doesn't need to deal with this, except it would need to know the play direction of the sample. However that i see more as a bonus function, the hard part is how to feed the sample data from buffered library into variable-playback, reading from SD Card, so it does not crash due to running out of data. Unfortunatly i am unable to help in this complicated topic, except for testing, since this is far above my coding skill.

h4yn0nnym0u5e commented 1 year ago

Please note I've started work, and posted on the relevant thread on the Teensy forum - see https://forum.pjrc.com/threads/67613-changing-pitch-of-audio-samples-TeensyVariablePlayback-library?p=323961#post323961 I'd prefer discussion on there simply because it'll reach a wider audience and result in more people having a crack at testing, making suggestions etc.

I did ask if Nic would prefer an completely separate thread for that discussion, but no response - guess the day job is getting priority!

newdigate / teensy-variable-playback

Collaboration for asynchronous file access? #51