zynthian / zynthian-issue-tracking

Centralized Issue Tracking for Zynthian Project
https://github.com/orgs/zynthian/projects/1
11 stars 3 forks source link

Unexpectedly high latency #858

Closed polluxsynth closed 4 months ago

polluxsynth commented 1 year ago

Describe the bug The latency from MIDI to when a sound his heard seems unexpectedly high.

To Reproduce EDIT: I realized after writing the bug report that I had SooperLooper in the Main effects chain. See the comments for more details.

  1. Connect a master keyboard to the Zynthian and another synthesizer (the Waldorf Blofeld having been used for these tests), using a MIDI through box.
  2. Set the Zynthian and Blofeld to the same receiving channel, same as the transmit channel of the master keyboard.
  3. Select a fairly percussive sound on both units.
  4. Play a note from the master keyboard to both units.
  5. Notice that there is noticeable flamming between the Zynthian and the other synth.
  6. Record from both instruments simultaneously using a DAW.
  7. Measure the latency between the instruments in a wave editor.

Expected behavior With the standard sample rate (44.1 kHz) and buffer size (-b 256 -n 2) settings on the Zynthian, the expected latency would be around 256 2 1/44100 = 11.6 ms, plus a couple of ms to allow for miscellaneous processing (likely mostly MIDI routing in the Zynthian).

Observed behavior With ZynAddSubFX selected for the Zynthian channel, the observed latency is about 23 ms. For comparison, the latency between the Blofeld and another synthesizer (Virus KC) turned out to be 4 ms.

As another comparison, ZynAddSubFX was run in standalone mode on an x86 PC running Debian Linux, with the same latency settings as Zynthian. I didn't measure the latency, but sounded much tighter than with the Zynthian. A setting of -b 512 -n 2 (i.e. twice the buffer size as the Zynthian) gives me the same “23 ms flam” feel as the Zynthian at -b 256 -n 2.

Setup: Hardware Raspberry Pi 4 Model B Rev 1.4 Audio: [HifiBerry DAC+ ADC PRO] Display: [ZynScreen 3.5 (v1)] Wiring: [MCP23017_ZynScreen] I2C: [MCP23017@0x20, MCP23008@0x21, ADS1115@0x48, MCP4728@0x61] Software zyncoder: stable (89b1eb7) zynthian-ui: stable (503aa74) zynthian-sys: stable (d19e5b7) zynthian-data: stable (9180393) zynthian-webconf: stable (0c759b3) System Raspbian GNU/Linux 10 (buster) Build Date: 2022-10-07 Memory: 14% (548M/3839M) SD Card: 53% (15G/29G) Temperature: 46.2ºC Overclock: None

Snapshot used The Zynthian snapshot I'm using has 7 parts: 3 from Zyn, one Pianoteq, one OB-Xd, one NoizeMaker and one Dexed.

Additional context (From this thread: https://discourse.zynthian.org/t/latency-in-stable-2211/7841/6)

I also compared a couple of other synth engines besides Zyn, as well as also comparing DIN MIDI Additional contextwith USB MIDI. For this purpose, I used a Blofeld Keyboard, as it has both DIN and USB MIDI.

First of all I should note that this time I was comparing the build-in synth engine in a Blofeld Keyboard against Zynthian, so the difference in latency was slightly larger compared to my previous test, being 26 ms for Zyn over DIN MIDI, and 25 ms over USB MIDI.

OB-Xd was slightly faster, at 24 ms over DIN MIDI and 23 over USB MIDI. Dexed was even faster, at 20 ms over DIN MIDI vs 18 over DIN MIDI (the fact that the diff was 2 rather than 1 here I’m attributing to rounding errors).

So Zyn seems to be the worst of the lot, but the latency is still fairly consistently long across all the synth engines that I tried. Even with Dexed, one can hear clear flamming when playing.

polluxsynth commented 1 year ago

So I did some experiments using patchage (hot tip: unclick View -> Sprung Layout to avoid the objects jumping about in the patchage window).

These were typical results. Redoing it seems to cause a variation of up to about 4 ms, for instance, I've observed from 23 to 27 ms when nothing was changed in the connection graph. Similarly, the 11 ms I observed in the second case was 15 ms another time.

At any rate, it seems that both the midi router and the zynmixer add appreciable amounts of latency. I wonder if the mixer does it because it actually delays the audio by one buffer? I have no idea, just speculating here. It's more suprising that the midi router should add a lot of latency I would have thought.

EDIT: I should note that the Zynthian snapshot I'm using has 7 parts: 3 from Zyn, one Pianoteq, one OB-Xd, one NoizeMaker and one Dexed.

polluxsynth commented 1 year ago

For comparison, I decided to run ZynAddSubFX inside Carla, on a Linux PC, and compare that to running ZynAddSubFX in standalone mode. For various reasons (because the Carla version I'm running is the result of a patched version), it's a different PC to the one mentioned above., but at any rate, comparing the latency of ZynAddSubFX against the Blofeld: Standalone: 12-14 ms. In Carla: 20 ms.

So it would seem that operating a synth within a plugin host adds appreciable latency. I wonder if this is because the extra mixing stage at the output from the plugin host to Jack adds another buffer of latency.

I'm not too well versed on exactly the buffers are passed from application to application, but it would seem that if one application (read: ZynAddSubFX) gets one full 'quantum' (buffer time) to finish its buffer fill, then the following application does not get the buffer until one buffer time after the first application started filling it. Whereas if an application were allowed to pass on the buffer as soon as its been filled, that it could fathomably pass through all applcations during one buffer time. After all, a mixer does not consume that much CPU just to adjust the levels. But I'm speculating wildly here.

polluxsynth commented 1 year ago

Turns out I had SooperLooper in the Main effects chain. Removing it brought the latency down, as measured with ZynAddSubFX against the Blofeld, to a more respectable 13 to 20 ms. Also, without SooperLooper, removing/adding zynmidirouter does not seem to have the same noticeable effect on the latency as noted above.

I don't know why SooperLooper has this effect, but it strikes me that with SooperLooper in the graph, there is a loop from an output on zynmixer to SooperLooper and then from SooperLooper back to an input on zynmixer. While a straight audio chain from a source to a destination can always be handled in one buffer period (5.8 ms), if there is a loop somewhere, an extra buffer must be added in some place. So I'm wondering if the extra latency is JACK trying to handle this loop. Perhaps there is some way to control where the extra latency is added; it seems that it adds it on the output from zynmixer, but perhaps there is some way to move it to the output of SooperLooper instead, so that the extra latency does not happen in the main signal path. All this is speculation, just theorizing.

As for why the latency varies (even without SooperLooper), I think the answer is that when MIDI data arrives, it will not be acted upon until next time the synth engine in question is called to fill its output buffer, which can take up to one buffer period, as the incoming MIDI data is in no way synchronized to the internal timing in the Zynthian.

riban-bw commented 1 year ago

Thanks for doing so much research. The main effects loop is there to allow us to have a final fader post-effects. This is a design choice so that we can always control the output level. Of course, as you observe this adds an extra step. In theory the whole jack graph is calculated in one period but maybe a feedback loop in the chain changes this behaviour.

There should not be varying latency / jitter (unless individual engines introduce it). There could be a flaw in the way zynmidirouter handles MIDI messages. It should dispatch them at the same offset within a period as it was received. (This is done in the sequencer.) I suspect it may be processing the MIDI more crudely and sending it at the beginning of the next period (like the sequencer used to do before I ironed out the latency). Again, the jack graph should be calculated in a single period but I am still to convince myself of when / how that happens.

Most (all?) Zynthian modules act as jack clients which are external to jack. There may be an advantage to making them internal modules which run within the same context as jack. This avoids context switching and may allow better graph resolution - though I am less confident of the latter.

polluxsynth commented 1 year ago

I wasn't really questioning the design choice of having an effects loop rather than a straight effects chain, just speculating that that was the cause of the additional latency. I can certainly understand the reason you mention; it makes good sense. I'm sortof wondering though if there is any way to control where the additional latency is added.

As for the jittery latency, my reasoning is this: Once every period, the callback set by jack_set_process_callback() is called to fill the output buffer. At this time, all the MIDI events that have occurred since the process callback was called last time are acted upon. So depending on when the MIDI events arrive, relative to the nearest period start time, will cause a varying latency from the arrival of MIDI event to when the note is actually started and the corresponding waveform generated.

I see though that it seems to be possible for the synth engine to actually find out when the MIDI event occurred, which means it could delay the generation of the start of the note for exactly one period; however, if this were the case, it would consistently add another period of latency, even though the timing relationship between MIDI events would be preserved perfectly.

Here I'm speculating though as I'm not well enough versed in the architecture of software synths.

Another reason for the varying latency could be that I'm comparing Zynthian against a Blofeld, not the onset of the MIDI message, and I really have no idea of what kind of latency the Blofeld has. Using an oscilloscope, comparing the audio output to the MIDI input, would result in more accurate measurements.

riban-bw commented 1 year ago

'find out when the MIDI event occurred, which means it could delay the generation of the start of the note for exactly one period'

This is exactly what the sequencer does and what zynmidirouter should do. I will check it's code.

polluxsynth commented 1 year ago

Wouldn't this add another period of latency - or is this in fact the first period of latency, the second one being from the time JACK calls process() for the synth and all other related devices in the audio chain to the time when JACK picks up the filled-in buffer at the start of the next period and sends it on to the DAC? (For a total of two periods).

riban-bw commented 8 months ago

The jack graph is processed in one cycle. All jack outputs are presented to inputs in sequence until the last one is processed or there is a loop back, hence there is only one cycle of latency for everything.

Some recent enhencements (in oram branch) to audio routing removes unnecessary loopbacks. If there are no post-fader effects and if the main chain does not have any pre-fader effects then internal normalisation within the mixer avoids the extra jack cycle, minimising latency.

Jitter is minimised by processing MIDI at the offset within each cycle at which it was received.

The kernel used in 2401 release is too old to benefit from improvements to USB audio latency. The next major release (Oram) will be based on 64-bit Bookworm Debian with a much later kernel that benefits from these enhancements.

Will you please recheck the issue against a fully updated oram image (flash a new sdCard with the latest test image) and update this ticket with your findings?

polluxsynth commented 8 months ago

Will do! Just got to find that spare SD card that I have for experiments.

polluxsynth commented 7 months ago

Ok, so I got around to testing a bit. I might need to redo this somewhat eventually, as I have some doubts as to the difference in latency between different channels on my sound card, but I think it's rather minor, in the region of 20 samples or so.

The test setup this time is my Roland ep77 piano, feeding both a Blofeld and Zynthian via a MIDI through box, so no additional latency in the MIDI setup. I chose the ep77 because I know it's got fairly fast response over MIDI, although it's not really critical in this application. I originally wanted to include the audio output from the ep77 as a reference, but it seems there's a bit of internal latency, so as measured, the sound from the Blofeld sometimes comes earlier than the ep77! Long story short, I ended up measuring the latency before the onset of the sound from the Blofeld and Zynthian, when playing single notes.

Anyway, I started out by redoing some initial measurements on the stable version which I have in my Zynthian, which is from August 2023. I set up three synth chains: MiMi-d (basically, OB-Xd), Dexed and ZynAddSubFX. All in all, the latency from Blofeld to Zynthian was a fairly stable 18 ms (800 samples @44100) regardless of the synth engine used.

I then loaded up the oram image and upgraded it from the webconf, and redid the test (using OB-Xd instead of MiMi-d). I'm sorry to say that the latency on average increased, but I could see a larger variation, from about 760 to 1040 samples, or 17 to 23 ms, with the most common values being around 950 to 990 samples.

I tried a few variations: with only one chain, and with three chains, but the results were the same. So, a bit disappointing I would say, both in terms of absolute values as well as the spread.

riban-bw commented 7 months ago

That is disappointing. May I check that both tests were performed on a RPi4 with Zynthian updated fully to 29/02/2024?

What are the jack settings?

polluxsynth commented 7 months ago

The jack settings are (directly pulled from ps aux ; I have not changed anything from the defaults): /usr/bin/jackd -P 70 -t 2000 -s -d alsa -d hw:sndrpihifiberry -r 44100 -p 256 -n 2 -X raw

Yes, the tests were performed an a Zynthian V4 with RPi4, and I performed a webconf upgrade 240229.

I redid some of the latency tests tonight, changing the method, trying to avoid some of the uncertainty from the previous measurements. So, this time, I have a master keyboard, feeding a MIDI through box, with one MIDI output going into the Zynthian, whose right output goes to the channel 1 of my sound card, and the other MIDI output by way of an adapter cable to channel 2 of the sound card, the idea being that if nothing else, the left-right pair of each input on the sound card will have the best chance of not having any interchannel latency. Each MIDI message results in a short (1 ms) burst of noise in the right channel, and thus it is possible to measure fairly accurately the time from the end of the MIDI message to the onset of the sound emitted from Zynthian, avoiding any uncertainty due to comparison with the latency of another instrument.

So, going back to stable from last year, with OB-Xd I get about 820 to 845 samples of latency i.e. 18.6 .. 19.2 ms. With oram, I get 1078 to 1086 samples of latency, i.e. 24.4 to 24.6 ms. I recorded about 10 or so note on messages and picked five or six of them as random samples to get an idea of the spread.

So the large variation in latency from yesterday didn't repeat itself, but the latency is definitely larger with oram than with stable. ZynAddSubFX had about 100..200 samples more latency in both versions.

Oddly enough, I experience the difference in latency more than the number of milliseconds would imply. When playing together with my electric piano (i.e. listening both to the MIDI controlled Zynthian and the piano together), with oram there's clearly audible flamming, but the stable version feels rather more tight. Perhaps there is a psychological limit somewhere around 19-20 ms, below which the two sounds tend to merge into one.

riban-bw commented 7 months ago

Thanks for this extra data. It is interesting that these latest measuments have lower jitter in oram which is good. The difference in measuments between 2401 and oram is approximately a processing perod (5.8ms).

The data flow in 2401 and Oram is the same:

MIDI IN -> ZynMidiRouter ->Synth Engine -> zynmixer -> Audio OUT

I observe a slight difference in the jackd config. It seems that your oram configuration adds -S to the alsa driver settings which prefers 16-bit word length rather than the default 32-bit. It is plausible this makes a difference. Please check whether these settings make a difference: -P 70 -t 2000 -s -d alsa -d hw:sndrpihifiberry -r 44100 -p 256 -n 2 -S -X raw.

For completeness of testing, it may be advantageous to test, swapping the outputs of the MIDI spliter, just in case there is some processing occuring there. (Unlikely but in science we remove all the variation we can.)

Will you confirm there is no extra processing block in any chain, e.g. MIDI / audio effects processors?

Will you please provide a snapshot of the test so that we can validate signal patch, etc?

If however the results are correct - it appears we have introduced an extra processing period within one of the zynthian modules (zynmidirouter or zynmixer). A recent update to zynmixer (a couple of weeks ago) removed the loop-back that would add such an extra cycle. At that time I thought I had ensured the default behaviour of zynmixer was to process all audio within one processing cycle. (I could be mistaken!) There has been some work on midirouter so it is plausible that an extra cycle has been introduced.

Add a snapshot here and we can take a look.

polluxsynth commented 7 months ago

I observe a slight difference in the jackd config. It seems that your oram configuration adds -S to the alsa driver settings which prefers 16-bit word length rather than the default 32-bit. It is plausible this makes a difference. Please check whether these settings make a difference: -P 70 -t 2000 -s -d alsa -d hw:sndrpihifiberry -r 44100 -p 256 -n 2 -S -X raw.

I edited jack2.service and added the '-S', restarted and verified with ps aux that the new settings had taken effect, but it had no effect on the latency.

For completeness of testing, it may be advantageous to test, swapping the outputs of the MIDI spliter, just in case there is some processing occuring there. (Unlikely but in science we remove all the variation we can.)

I agree about testing various things to narrow it down. But I don't understand what you mean by 'swapping the outputs of the MIDI splitter' ? Or do you mean my MIDI patchbay (which is a Yamaha MJC-8), and not something in Zynthian? I can try, but I'm fairly confident that the MIDI patchbay in terms of MIDI is a dumb device, it just routes the signals in hardware, and at any rate, I have not changed outputs between the different tests. I'll see if I can find an ordinary MIDI through box which guaranteed has no processing delay.

Will you confirm there is no extra processing block in any chain, e.g. MIDI / audio effects processors?

Yes, that is true. I never added anything to the chains, and I've confirmed using patchage that there is nothing else except the three synth engines and the ZynMidiRouter etc.

If however the results are correct - it appears we have introduced an extra processing period within one of the zynthian modules (zynmidirouter or zynmixer). A recent update to zynmixer (a couple of weeks ago) removed the loop-back that would add such an extra cycle. At that time I thought I had ensured the default behaviour of zynmixer was to process all audio within one processing cycle. (I could be mistaken!) There has been some work on midirouter so it is plausible that an extra cycle has been introduced.

What I did was, in patchage , first bypass zynmixer by passing the output of OB-Xd directly to the audio output. This didn't make any difference in terms of latency, neither when the output of OB-Xd was feeding both the audio output and zynmixer, nor when feeding just the audio output.

Then I bypassed zynmidirouter and connected the MIDI input of OB-Xd directly to the MIDI input (what is it called - uart midi?). That brought the latency down from about 23 ms to 18 ms. So it appears zynmidirouter introduces a one period delay. I didn't try removing other connections to zynmidirouter, for instance, I notice that the sequencer is connected to both inputs and outputs, perhaps that type of looped connection forces zynmidirouter to add one period of delay?

I tried to get patchage to dump the midi/audio graph to a file but it didn't write anything. I'll have to try again, if nothing else doing screen dumps instead. Also, I'll get back to you with the snapshot file.

riban-bw commented 7 months ago

Thanks! This is really useful in identifying the culprit.

the sequencer is connected to both inputs and outputs, perhaps that type of looped connection forces zynmidirouter to add one period of delay

It is worth testing this. Temporary remove the seq route and test if it had an effect.

polluxsynth commented 7 months ago

Here is the patchage graph of my 'triple synth' (OB-Xd, Dexed and ZynAddSubFX) setup in oram, as well as the corresponding snapshot.

patchage-triplesynth.pdf last_state.zip

I also tried removing everything from zynmidirouter except the MIDI input and the three synth engines, i.e. removing all nodes which connected to both the input and output side of zynmidirouter. But it was to no avail; the latency is still around 1080 samples = 24.5 ms.

polluxsynth commented 7 months ago

I can also add that I tried connecting the captured MIDI which I was recording from another port on the same MIDI thru box that I'm using to drive the Zynthian, but the result is the same as when coming through the MJC8 MIDI patchbay, i.e. the MJC8 does not add any latency of its own (not surprising since it has no processing whatsoever of the MIDI data, it just routes it).

polluxsynth commented 7 months ago

One thing I'm trying understand is why the base latency is three times the period time, not two. But perhaps that's standard behavior, I've never thought through exactly how the data moves through a plugin system in real time. I was thinking that there ought to be two phases: first, during one period, input data (MIDI and audio, in this case only MIDI) is collected in the input buffer. During the next period the plugin code and any output mixer code is run, which ends with writing audio to the output buffer, and finally dumping the output into the DAC buffer, which then starts outputting it during the next period. But perhaps it's a JACK thing that one extra period needs to be added because the output is not directly written to the DAC buffer, but instead there is one more intermediate stage. Really a separate topic from this ticket though.

riban-bw commented 7 months ago

Thanks for to extra info. All useful diagnostic data.

Regarding latency due to data processing, there is some latency introduced during capture to decouple the remote data rate / clock from the local and to decouple the rate of data acquisition from the rate of data processing. The former is fairly constant but the latter is very peaky. There are further buffers at the output for similar reasons. That leaves the processing periods. Jackd processes it's whole graph in one processing period, i.e. all data at the inputs of the graph are processed and passed through all elements of the graph to the outputs in one period, e.g. 265 frames. However this must have a breakpoint if the signal is looped back and becomes re-entrant (, otherwise it would be an infinite loop). I suspect this is fine by reference counting to stop processing when a frame has been processed by a unit already within a period.

So the input and output buffers assigned to alsa are in addition to one processing period of latency added internally.

polluxsynth commented 7 months ago

Thanks for your explanation. It makes sense. Hope you can figure out what the problem is with zynmidirouter!

riban-bw commented 7 months ago

psychological limit somewhere around 19-20 ms

Yes, there are thresholds at which we percieve latency in different ways, At about 40ms it becomes too difficult to play against. At about 20ms it is noticable to hear. At about 10ms it is almost indicernable for most people / playing styles. Virtuoso musicians may find anything above about 5ms to impact their playing. Below 5ms tends to have minimal impact on the feel of playing but some will notices almost any latency. But bear in mind that sound takes about 3ms to travel 1m so most players of amplified instruments are likely to be used to this kind of delay. Of course this can become cascaded so we do want to minimise latency. The sweet spot is about 10ms which should be achiavable with Zynthian.

I can't yet see where midirouter introduces an extra cycle of latency. It shouldn't! I will continue to investigate.

riban-bw commented 4 months ago

Revisiting this ticket (after being prompted in the forum!) I am just going to summarise where we are:

riban-bw commented 4 months ago

I missed a rather important check: jackd latency. We have found jackd to be configured to run in asynchronous mode which adds a period of latency whilst reducing the risk of xruns. Adding -S to the jackd command line (before -d alsa) sets synchronous mode. (This is not documented in the jackd man page.) We have also removed -S from the alsa config which changes word length from 16 to 32 bit. This change has removed the extra internal jack buffer which brings latency down to 2 buffers, e.g. 11.3ms at 48000 fps, 256 buffers. Now to review the points in the previous comment.

riban-bw commented 4 months ago

I have reviewed the code in zynmidirouter and it looks okay. Most MIDI messages are passed with the same event time (frame offset) as received with the exception of sysex and internal MIDI messages. Will @polluxsynth please retest and if you find an issue with latency or jitter, please describe you tests in detail so that we can trace through the signals from input to output, considering all the code inbetween?

polluxsynth commented 4 months ago

After booting from the latest oram image (240522), I updated the software, bringing in @jofemodo 's latest patch where the jackd command line was changed from : -P 70 -t 2000 -s -d alsa -d hw:sndrpihifiberry -S -r 48000 -p 256 -n 2 -X raw to: -P 70 -t 2000 -s -S -d alsa -d hw:sndrpihifiberry -r 48000 -p 256 -n 2 -X raw

From end-of-midi-note-on to onset of MiMi-d output, I get a consistent 758-762 samples latency @ 44.1 kHz now, corresponding to 17.2-17.3 ms.

riban-bw commented 4 months ago

That's good news. Let's close this ticket then.

riban-bw commented 4 months ago

Unfortunately, running jackd in synchronous mode causes massive (10s of seconds) delay for clients to connect to jackd. We must revert this change and identify what is the cause of this far more significant issue.

riban-bw commented 4 months ago

Update: If a client tries to connect to jackd, it may try to start a new instance unless flagged not to or jackd is already running. For some reason, not flagging this is causing the long connection delay. Maybe it relates to the jackd name or some odd edge effect. Setting the correct flag to avoid this or setting env var JACK_NO_START_SERVER seems to fix the issue. Hopefully this is now resolved.

riban-bw commented 4 months ago

After some further investigation we have resolved all the known issues.

Setting JACK_NO_START_SERVER globally and removing the 2s client timeout has made things work quite well.