lua NRT processing function

robbielyman commented 1 year ago

This springs from #1623. The idea is to add a new softcut function process_buffer that requests BufDiskWorker to process a section of a buffer with a lua-defined processing function, and executes a callback when finished.

The challenging part appears to be correctly passing a pointer to w_handle_softcut_process across OSC to BufDiskWorker. Currently it appears as though w_handle_softcut_process is never called, but no error is raised when I check the logs, and the callback never fires.

robbielyman commented 1 year ago

If you happen to have any time to look over this, @catfact, I'd be much obliged.

robbielyman commented 1 year ago

well, it's working, in a manner of speaking. It looks like 48000 samples is too many to pass over OSC; 4800 works fine. I attempted splitting into blocks of 1024, but attempting to process the entire buffer causes norns to crash, I think because too many OSC messages are spawned too quickly? I'm not sure the best way to slow down the rate of processing...

catfact commented 1 year ago

yeah passing samples over OSC is actually not quite what i had in mind and will likely bog down the system.

but, what i had in mind is not very easy with the current architecture. i see two options:

a shared memory section for temporary buffer-processing space. crone copies to this space, matron processes it using the supplied lua function, crone then copies it back to the softcut buffer.
(more effecient but harder): somehow pass the function definition to crone. which i guess means a separate lua interpreter run on the BufDiskWorker thread.

robbielyman commented 1 year ago

Probably passing the function definition would lead to a lot of confusion and minimize the fun you can have with this—since the function can reference any Lua global, for example, you'd have to essentially replicate the entire state of norns on the crone process or profusely warn the user that things that "should" work will not.

The first option sounds doable. I'll poke around and see how to go about accomplishing it, but any pointers (uh, as it were lol) would be very welcome.

robbielyman commented 1 year ago

Huh. The current refactor compiles just fine on Arch, but I get a linker error on norns:

/usr/bin/ld: crone/src/BufDiskWorker.cpp.1.o: undefined reference to symbol 'shm_unlink@@GLIBC_2.4'
/usr/bin/ld: /lib/arm-linux-gnueabihf/librt.so.1: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

weird that it's shm_unlink and not, say, shm_open or mmap that cause the issue...

robbielyman commented 1 year ago

It's working! Or at least, it can handle on the order of 10 seconds of the buffer. I get a stack overflow if I call process with -1 as the duration argument. it's not clear to me whether that's because the number of samples exceeds the size of a Lua number or some other reason.

robbielyman commented 1 year ago

Aha, I had forgotten a lua_remove. It's now working!!

tehn commented 1 year ago

based on the limits of OSC, what are the practical limitations of this implementation? you mention 10 seconds, above. and how does this impact concurrent processes? (ie, is 10 sec only viable with nothing else running? does it stall any other threads?)

if we were to include this i'd want its capabilities well known.

robbielyman commented 1 year ago

Here's a clearer description of what's happening:

on startup, crone and matron open some shared memory.
In Lua, the user calls softcut.process_buffer. What this does is signal over OSC to crone (BufDiskWorker) that it should copy the relevant section of the buffer into this shared memory. This part is basically the same as softcut.buffer_copy_mono: happens in the background, shouldn't impact any other threads.
When this is done, crone signals over OSC to matron to begin processing the shared memory. This is done in the main Lua thread, and appears to currently block user interaction while processing is active. This is definitely less than ideal, especially considering that the user-defined processing function need not be lightweight at all, and processing the entire buffer takes several seconds to complete. I'd love input on how to make this concurrent or at least flexible enough to yield back to the rest of the Lua thread when it needs to.
just to be clear, the 10 seconds limitation above is no longer relevant. the current implementation is capable of processing the entire buffer.
After the buffer is processed, matron signals to crone over OSC to copy from the shared memory back into the buffer.
My favorite part is that after this copying is done, crone tells matron over OSC. I wrote this "done callback" to be extensible on purpose: if there's interest, I'd like to add it to the other non-realtime buffer functions as well, to help a script author know a little more about the state of the softcut buffer.

So that's what's going on in a nutshell.

In total, the OSC burden is very light; a total of four calls, and no buffer data passed.
On the other hand, the shared memory is not nothing; it's reused between calls, but can be up to sizeof(float) * 2^24, which is a not-insignificant amount of storage.
The current weaver-side implementation blocking the Lua process is also not great.

catfact commented 1 year ago

I'd love input on how to make this concurrent or at least flexible enough to yield back to the rest of the Lua thread when it needs to.

well, it would be possible to wrap the NRT processing in a coroutine (and probably you'd want to chunk it up into blocks.) thing is, norns core has no real existing model for cooperative multitasking (except for the very specific system around clocks which is driven from C.)

but if you are up for it, NRT processing could expose some kind of API to scripts that want to perform UI updates. i guess i would look at the clock module for something similar.

so: might want to refactor so thatSC.process_bufferis called repeatedly with a fixed/maximum block size. when it's done with a block it would signal back to the C layer to pull another block from shmem and raise another event. in between events the main thread could update other things. a coroutine would help by allowing the user-defined processing function to maintain state between blocks.

that's my 2c. sorry i don't have more bandwidth to implement or double check things.

robbielyman commented 1 year ago

so: might want to refactor so thatSC.process_bufferis called repeatedly with a fixed/maximum block size. when it's done with a block it would signal back to the C layer to pull another block from shmem and raise another event. in between events the main thread could update other things. a coroutine would help by allowing the user-defined processing function to maintain state between blocks.

Thinking about this I'm wondering how deep the block thing should go. That is, should each block require an OSC handshake with BufDiskWorker? This would allow us to define a hard cap on the size of the shared memory (at the cost of more OSC messaging), or should it be merely a way for Weaver to pace itself? This would keep OSC clear if that's a priority.

that's my 2c. sorry i don't have more bandwidth to implement or double check things.

No worries! I've been enjoying hacking away at this, and your insight has been valuable :)

robbielyman commented 1 year ago

I'm pretty happy with where things are now! I ended up going with the latter of the two options above. Thus if you run softcut.process_buffer with dur = -1, the whole buffer is loaded into shared memory.

The usage for softcut.process_buffer has changed: it now returns a function that is designed to be fed into clock.run. Here's an example script.

-- lua-nrt-test.lua

SAMPLE_RATE = 48000
FREQ = 440

local function start_playing()
  for i = 1, 2 do
    softcut.enable(i, 1)
    softcut.level(i, 1)
    softcut.play(i, 1)
  end
end

local function process(sample_index, _)
  if sample_index == 0 then print("hi from process!")
  local phase = (sample_index * FREQ / SAMPLE_RATE) % 1
  return math.sin(2 * math.pi * phase)
end

function init()
  softcut.process_func(process)
  softcut.event_done(function(ch, job_type)
    if job_type ~= "process" then print(job_type) return end
    if ch == 1 then
      FREQ = 550
      print("processing buffer 2")
      clock.cancel(Process_Clock)
      Process_Clock = clock.run(softcut.process_buffer(2, 0, -1))
    elseif ch == 2 then
      print("let's gooooo")
      start_playing()
    end
  end
  local redraw_clock = clock.run(function() while true do redraw() clock.sleep(1/15) end end)
  print("processing buffer 1")
  Process_Clock = clock.run(softcut.process_buffer(1, 0, -1))
end

local frame = 0

function redraw()
  frame = (frame + 1) % 15
  screen.clear()
  screen.move(5 * frame, 30)
  screen.level(15)
  screen.text("yo")
  screen.update()
end

In maiden, you should see the output

# script init
processing buffer 1
hi from process!
processing buffer 2
hi from process!
let's gooooo

while the animation on the screen and any navigation to the menu is not interrupted

catfact commented 1 year ago

sorry i have been AWOL on this. in honesty i don't think i'll be able to really test this on HW.

i've attempted a more or less thorough review and though some small tidyings and refactorings are possible, it all makes sense. i'd recommend to @tehn and @dndrks that this large and strange feature be merged but tested in beta for a while.

dndrks commented 1 year ago

@ryleelyman , bump from the deaaaad! just doing the rounds for release doc updates and this is the last (very exciting!) checkbox :)

the code snippet above was missing some closed parens and an 'end', so i made some assumptions here:

```lua -- lua-nrt-test.lua -- @ryleelyman is this right? SAMPLE_RATE = 48000 FREQ = 440 local function start_playing() for i = 1, 2 do softcut.enable(i, 1) softcut.level(i, 1) softcut.play(i, 1) end end local function process(sample_index, _) if sample_index == 0 then print("hi from process!") end local phase = (sample_index * FREQ / SAMPLE_RATE) % 1 return math.sin(2 * math.pi * phase) end function init() softcut.process_func(process) softcut.event_done( function(ch, job_type) if job_type ~= "process" then print(job_type) return end if ch == 1 then FREQ = 550 print("processing buffer 2") clock.cancel(Process_Clock) Process_Clock = clock.run(softcut.process_buffer(2, 0, -1)) elseif ch == 2 then print("let's gooooo") start_playing() end end ) local redraw_clock = clock.run(function() while true do redraw() clock.sleep(1/15) end end) print("processing buffer 1") Process_Clock = clock.run(softcut.process_buffer(1, 0, -1)) end local frame = 0 function redraw() frame = (frame + 1) % 15 screen.clear() screen.move(5 * frame, 30) screen.level(15) screen.text("yo") screen.update() end ```

this version loads, but i'm only getting processing buffer 1 to print and the lil' traveling yo on the screen -- i know it's been a minute, but whenever you have some time, could you take a peek at the test code to confirm whether things are runnin' on your end? tyty!!

robbielyman commented 1 year ago

Hmmm. The following works for me; there seem to be no functional differences between the two.

-- Lua NRT test
-- @alanza

SAMPLE_RATE = 48000
FREQ = 440

local function start_playing()
  for i = 1, 2 do
    softcut.enable(i, 1)
    softcut.level(i, 1)
    softcut.play(i, 1)
  end
  softcut.voice_sync(1, 2, 0)
end

local function process(sample_index, _)
  if sample_index == 0 then print("hi from process!") end
  if sample_index % 1024 == 0 then print(sample_index) end
  local phase = (sample_index * FREQ / SAMPLE_RATE) % 1
  return math.sin(2 * math.pi * phase)
end

function init()
  softcut.process_func(process)
  softcut.event_done(function(ch, job_type)
    if job_type ~= "process" then print(job_type) return end
    if ch == 1 then
      FREQ = 450
      print("processing buffer 2")
      clock.cancel(Process_Clock)
      Process_Clock = clock.run(softcut.process_buffer(2, 0, 1))
    elseif ch == 2 then
      print("let's gooooo")
      start_playing()
    else
      print("weird... got ch: " .. ch .. " and job type: " .. job_type)
    end
  end)
  local redraw_clock = clock.run(function() while true do redraw() clock.sleep(1/15) end end)
  print("processing buffer 1")
  Process_Clock = clock.run(softcut.process_buffer(1, 0, 1))
end

local frame = 0

function redraw()
  frame = (frame + 1) % 15
  screen.clear()
  screen.move(5 * frame, 30)
  screen.level(15)
  screen.text("yo")
  screen.update()
end

I will say that it takes much longer for the buffer to be processed than it does for it to be played back, so for the docs, maybe it makes the most sense to include something like what I have above, which provides at least some Maiden feedback while it's working, and only processes a small chunk of the buffer.

monome / norns

lua NRT processing function #1634