Closed catfact closed 6 years ago
Maybe we can harness ancient powers of the Secret Rabbit Code (aka libsamplerate) - http://www.mega-nerd.com/SRC/
I would advise backing up from softcut & solving a more manageable problem - in-memory varispeed 'tape loop sim' using block processing & libsamplerate. Behaviour is either write or read. With that building block in place we can modify the read/write/erase head object for variable erase ratio (aka sound-on-sound) . The erase percentage would be another input buffer.
My current thinking is that two of these read/write/erase heads can be composed to build a soft-cut engine. In order to compose the two heads, you must compute two erase-ratio blocks and two write-signal blocks in soft-cut object, then mix the two read-signal blocks to get your output. Will that strategy work or is this a leaky abstraction?
no, i think its a good point - a single head with interpolated write/erase is a useful abstraction in itself (so, add interpolation to the BufWrPre
ugen - maybe using SRC indeed)
SoftCutHeadLogic
is more or less as you describe, but isn't really structured in the cleanest way w/r/t phase update step - i think for it to work, the write head should maintain its own phase.
the one hole i see in the abstraction is one of the things that prompted me to make this a monolithic UGen in the first place - with crossfading and SOS, to avoice clicks the erase/write logic actually has to know about the xfade position, and the way to properly use it depends on the intended application. here's what i said about that in the other issue (dunno how clear it is!)
tricky edge cases, relating to behavior of write head during crossfade:
similarly, it may or may not be valuable to scale the overdub level ("pre") by the inverse of the fade.
i don't think there is a general solution for this issues in every use case. so i've simply added recFade and preFade parameters, which control how much the record level is scaled by the fade, and prelevel scaled by inverse of fade. if you want to scrub around while recording without glitches, they should (i think) both be turned up. if you want to continously record/overdub and play a seamless loop, recFade should be all the way down, and preFade is sort of optional (effect.)
ohho, something possibly relevant: https://github.com/electrickery/pd-ipoke-.git
(this is what raja ripped for karma~ - which itself is a complete unportable monstrosity)
ed - well this doesn't actually solve much - as far as i can tell it does linear interpolation when >1 sample indices are covered in a single write update. which is pretty simple
yeah, thinking about SRC
the "full API" is the one that gives you resampling on streams (and has an option for smoothly interpolating the SR ratio) http://www.mega-nerd.com/SRC/api_full.html
problem is: this solves the problem generally in i think the only really possible way: pushing N input samples results in an arbitrary number of output samples, depending on the current SR ratio and the last state of the streams / resampler.
this is what i mean by i think the write head requiring its own phasor / ringbuffer. it's clear enough how to make a varispeed write-head that should minimize artifacts when the material is played back at a different rate.
it's a little less clear how to marry this to @tehn's request for a varispeed read/write head. maybe it will work fine:
src_reset()
maybe enforce at least a 1-sample delay between read and write positions (right now there is an arbitrary offset parameter)
Think there's a way round the 'tricky edge case' - namely square-root xfade envelope for both read & write.
So first off, I believe the looping feature needs to be always implemented via X-fading. With aleph lines, that looping feature works just like a regular circular buffer - the instantaneous jump of looping point leads to discontinuity on the 'tape'.
If you implement looping using X-fades with square-root xfade envelope for both write & read, then the product of these two envelopes will be a linear xfade & the output will be identical to circular buffer (see diagram) . But now if you move the loop point there are no first order discontinuities written to tape - problem solved!
another aspect of softcut/lines where I had a sudden revelation about the design, namely:
How to handle a crossfade 'request' received during crossfade.
the state machine described above should have the following properties:
I claim this is optimal!
i agree!
the diagram is exactly how SoftCut works (with a raised-cosine xfade curve) - it alternates between two targets, the "next" one starts fading in exactly at the loop point, the "current" one starts fading out exactly at the loop point, and you can't inititate a "cut" to a new position in the middle of a fade. (we could make a "polyphonic" version that would allow it! but that would be nuts.)
its totally fine, the loop length remains accurate even when very short, and the perceived loudness is constant throughout the xfade.
the "edge cases" is solely about how/whether to apply the crossfade to the write level and even more so to the erase level. i'm sorry i'm having a hard time articulating all the scenarios where this matters. but it can get weird when you move the loop points, and what used to be the "tail" becomes the "middle."
in general, i agree - a sensible default is to apply xfade to both write level and read level.
the problem is with the erase head. you have to do some kind of crossfade of the erase level as well, to avoid glitches when you move the loop points and start hearing partially-erased material from the old tail. this interacts in kind of weird ways with the record-fade.
default that works ok is to apply the inverse of the xfade to the pre-level (so that when completely faded-in, the pre-level is wherever you set it, and there is no erasure when completely faded out.) but in a continuous loop, this erase-fade applied on top of the record-fade makes a dip in perceived amplitude around the loop point (i think) - in that case, i think you want no xfade of the write level and fully inverted xfade of the erase level... i think...
but try it out! the current parameters allow different combinations of crossfade application, and they have different effects on the character of the crossfade, especially when recording and partially erasing in a continuous loop, or when changing loop points. (i was surprised.) the crossfades can also be arbitrarily long in this implementation, so you can really hear it.
happily, the xfade stuff seems pretty orthogonal to the varispeed issues. i think the xfade is pretty much solved, and i think you're right that the varispeed issue should be tackled independently.
to that end, i made a new ugen called VariHead
- it is dumb and doesn't crossfade at all. it is only a write head with an internal phasor (and it can output the "logical phase" so could be used as a building block in more complex synthdef.) initially i won't implement any partial-erase mechanism and just try to get the resampling aspect clean.
ok, VariHead
is now a finished proof of concept, it seems to work as expected... i've learned some things about libsamplerate:
it has four interpolation modes: linear and three approximations of sinc. the sinc modes are kinda problematic for realtime because they seem to take some time to "prime", and they have latency; that is, they don't use every input sample immediately, so AFAICT we would have to store unused input samples in a ringbuffer between blocks. (technically we should be doing this anyway, but in linear interp i never see any unused input frames.)
SRC interpolates the SR ratio internally if you change it in the data structure passed to calls to src_process
. the interpolation is annoyingly slow. but you can also set the ratio in the SRC state machine directly - this is discouraged in the API but it doesn't seem to have any ill effects in linear interp mode.
so, it's kind of a bummer but i think linear interp is pretty much the only feasible way to use SRC in a realtime context. it means artifacts from very low input/output SR ratios are noticeable, but nothing terrible to my ears.
there's other libraries, like zita-resampler, which claimes to be faster and well-suited for realtime work. but the described algorithm is some kind of polyphase FIR resampler, which seems like it would also have pretty rough latency problems.
what i've got working so far is block-processing. it will get complicated to incorporate this with sample-accurate loop crossfades and triggers. so should try going back to the sort of ridiculous method of pushing a single input frame on each call to src_process
- comparing performance - now that the basic POC seems legit.
update: for some reason calling scr_process
every sample introduces clicks.
this is so annoying. i feel like the linear interpolation is simple enough to just implement it directly...
btw @ranch-verdin i finally grokked your second comment about queuing up multiple crossfade targets (i think!) and that seems like a great idea.
bummer about SRC - sorry to lead you down a dead-end! Maybe I could quickly jump in & contribute by writing a small cubic resampling library with similar API...
I think the block abstraction to resampling lib remains useful, even if you're writing in frame-by-frame paradigm. Because obviously if rec/play head runs faster than 1x you still need to handle a small block per-frame...
I handle this messily in varilines (i.e no explicit block-resampling abstraction): https://github.com/monome/aleph/blob/dev/dsp/buffer16.c#L185-L220
update; i've refactored SoftCutHead
with a Resampler
class for write, currently just performs linear resampling but doesn't require SRC or any other external lib
this is in softcut-resample
branch. unfortunately it totally crashes on ARM for reasons unknown.
folded this into issue #407 as a task. so closing this one. (great discussion though!)
the primary purpose of
SoftCutHead
is to facilicate smooth and accurate crossfading, and for this purpose it is fine as is (IMO)however, its varispeed qualities present some difficult problems given that read and write positions are locked to the same phasor. interpolated reads are fine, but interpolated writes are more challenging.
latest commit (a710b33efd5b552a0294bd479f275c1641deb349) demonstrates my current thought about the best way to approach the problem - it's "physically" direct, but has some unintuitive consequences.
describing the problem
firstly, a SoftCutHead is really two phasors which perform corssfading; each is attached to a "record head", "erase head", and "read head", each with arbitrary levels. logically, for the purposes of this issue, it can be treated as a single phasor.
the "rate" of a SoftCutHead is a floating-point increment of the address into the buffer, to be applied on each physical tick of the sample clock. "rate" and "phase increment" here are interchangeable terms.
current approach and behaviors
every time the phasor ticks over to a new integer value, the record/erase heads apply the current input value to one or more samples in the buffer.
if rate is >1, it is possible for a single physical sample tick to jump over more than one target sample index. in this case the intervening sample indicies are written to with linearly-interpolated values starting from the previously written value.
there is no other attempt to interpolate values. so, for example, setting rate = 5/4 will cause very noticeable jitter noise. what should happen might be something like this: -- assume the previous phase was 0 and the new phase is 5/4 -- let x0 be the previous input sample value and x1 be the new input sample value. -- the buffer at index 1 should be written to with the value
(x0 + ((x1 - x0) * 4/5))
. even this simple interpolation involves some complicated bookkeeping and its not clear how beneficial it is.when the rate is an integer > 1, there is no audible effect until the rate is changed - then, material recorded at the higher rate will be played back at the new, lower rate with relatively little distortion (linear interpolation.)
with the rate set to < 1, the pitch is unchanged but there is an effective lowering of the sampling rate. if the rate is not an even division of 1, there is also jitter.
material recorded at a low rate sounds fine when played back at a high rate - the reconstruction is perfect if the old rate was an even division of 1.
possible other approaches
i think the only real solution is to have a separate ringbuffer for the write head, that is constantly being resampled to the physical buffer. this could be expensive!
of course a write/erase head driven by an independent phasor is always an option (interpolated or not.) this can lead to clicks when the phasors cross, but even that should be addressable in a SYnthDef (e.g. by incorporating a fadeout based on distance between read/write phasors, when that distance is small.)
again, changing the rate without recording new material is no problem - it uses cubic interpolation on the read head and sounds fine. app/scripts can be deisgned to just never perform recording at non-integer rates.
synthdefs/engines could possibly incorporate rate-dependent filters to mitigate aliasing effects.
other ideas? please share them here!