Defer rendering. - Githubissues

jmamma commented 4 years ago

When DEFER_SEQ is enabled, the sequencer is rendered into one of two MIDI ring side_channel buffers.

On the next Seq() event the MidiUart.txRb_sidechannel is pointed to the sequencer buffer rendered on the last Seq() event. This triggers the UART to drain the buffer until it is empty, at which point the side_channel buffer is disabled and the UART reverts to transmitting data from the main MidiUart.txRb buffer.

Two side channel buffers are used such that the sequencer can render to one, whilst the other is drained. On the next Seq() run the buffers are swapped.

Why?

In theory, this should reduce latency between sequencer events.

Issues:

The side channel buffer is drained as the adjacent sequencer buffer is rendered. This may still cause latency due to context switching.

When the sequencer is stopped or paused, we must re-render the sequencer prior to start/continue.

RAM usage. 2 side channel buffers are used in addition to the main TX buffer.

jmamma commented 4 years ago

It looks like this works fine.

Compile size is 732 bytes extra when DEFER_SEQ is defined.

--

@yatli We need to profile the performance of this mode. But it means that we should no longer be worried about seq() execution time... can pack as many tracks in as we like, provided they render before the next tick.

jmamma commented 4 years ago

Sequencer completion time + jitter profiling:

https://github.com/jmamma/MIDICtrl20_MegaCommand/pull/125#issuecomment-671036024

jmamma commented 4 years ago

31250bps/ 10 = 3125 Bps 3125 * 8 = 25,000 Bps @ 8x TURBO

1/25,000 * 1000 = .04 ms to transfer 1 byte

Trig track = 3 bytes * 0.04ms = 0.12ms

Trig 16 tracks = 1.92 ms.

jmamma commented 4 years ago

Scope is measuring 1.860ms to transmit 16 tracks. 0 jitter.

Woah.

jmamma commented 4 years ago

MD crash/lockup scenario: Trig sixteen tracks then rapidly adjust Mixer levels. Can't replicate with #DEFER_SEQ disabled.

Deferred buffers are probably swapping in before complete MIDI message

jmamma commented 4 years ago

I just overhauled how MCL handles MIDI byte ordering. The MD is very sensitive to this and will crash/lockup if it receives incomplete messages.

Because the Seq side_channel Buffers are independent of the main TX buffer there is no longer the risk of the sequencer ISR inserting data mid message stream. We can now remove USE_LOCK/SET_LOCK in some of the message transmit routines.

There was one additional problem. We must not switch to the side chain buffer when the main buffer is in the middle of a message. So I added code to detect message transmission completion. This now resides in the transmit ISRs.

jmamma commented 4 years ago

Final scope measurement to trig 16 tracks:

1x TURBO: 9.6ms 2x TURBO: 7.3ms 4x TURBO: 3.68ms 8x TURBO: 1.852ms

At 128BPM
B = 1 beat is ((1/(128 / 60)) = 0.46875 seconds long
Edit: calcs were wrong.
(B / 4) / 6 = 19.5ms per MIDI_CLOCK
9.75 ms per div192.

At higher tempos and low TURBO speed, there is the possibility that the side_channel buffer does not drain completely before the swap. Will need to add in code to copy contents of one buffer into the other at start of Seq()

jmamma commented 4 years ago

Realtime vs Deferred rendering

I've been analysing seq() performance on ext_conv and defer rendering branches. The goal is determine whether this new implementation is better.

Our main concerns are latency and jitter.

Latency: The delay between the clock event, and the resulting output of the sequencer. (trig vs response) Jitter: The variation in timing between the output of adjacent sequencer events.

In this analysis we will be looking at Jitter. Specifically the jitter between 8 MIDI note messages occuring on the same sequencer step.

Test method: Trigger 8 tracks simultaneously on every step. Toggle port output and observe timing on scope.

ext_conv test:

  for (uint8_t i = 0; i < num_md_tracks; i++) {
    PORTB |= (1 << PB5)
    md_tracks[i].seq(uart)
    PORTB &= ~(1 << PB5);
  }

md_tracks[i].seq(uart) will usually complete in approximately 12us with 1us of jitter. sporadic variations in completion time were observed at 100-200+ us.

defer_rendering test:

383     if (!MidiUart.txRb_sidechannel->isEmpty_isr()) {
384       MidiUart.sendActiveSenseTimer = MidiUart.sendActiveSenseTimeout;
385       UART_WRITE_CHAR(MidiUart.txRb_sidechannel->get_h_isr());
386       //PORTB |= (1 << PB5);
387     }
388     if (MidiUart.txRb_sidechannel->isEmpty_isr()) {
389       //PORTB &= ~(1 << PB5);
390       MidiUart.txRb_sidechannel = nullptr;
391     }
392   } else {

Rendering time is not relevant here as the sequencer data is pre-rendered in to a transmit side_channel buffer.

In this test we measure the time required to drain the entire transmit side_channel buffer. The scope shows a consistent measurement, with no variation. This implies no jitter.

--

Discussion:

With the original real time sequencing method, data is rendered and then immediately written to the MidiUart.tx buffer. The buffer is drained via ISR trigger, and will continue to empty asynchronously during the remaining run of the sequencer. That is, as soon as data is written to the MidiUart buffer, it will be transmitted at a rate according to the port speed.

Jitter will only become a noticeable problem if rendering time is greater than the MIDI message transmission time. If the sequencer event can't complete before the transmission of the last MIDI messages then there will be a duration of time in which the transmit buffer is empty, leading to jitter between sequential message transmissions.

At 8x turbo it takes (0.4ms * 3) 120us to transmit a note_on message.

From oscilloscope observation it appears that the rendering time of md_tracks[i].seq(uart) is unpredictable. It can take up to 200us, which exceeds the 120us threshold.

At 1x turbo it takes 960us to transmit a note_on message. As such rendering time should have no effect on MIDI message jitter here.

Further Considerations:

Longer rendering time increases latency for transmission of Ext tracks. ExtTracks are rendered after MDTracks, and thus their transmission start time depends on the completion time of previous sequencer events.

The deferred rendering approach will drain both UART1 and UART2 side channel buffers simultaneously eliminating this latency.

jmamma commented 4 years ago

Phase shifted when syncing to A4 on UART2. I suspect the A4 is transmitting midi start + midi clock at same time. This means there is no time for the sequencer to render the first tick.

we will need to pre-render midi_stop condition.

yatli commented 4 years ago

Does the measurement reading align with what we get from the diagnostic page?

jmamma commented 4 years ago

Which measurement reading?

jmamma commented 4 years ago

Too much additional code/complexity to pre-render both MIDI_START and MIDI_CONTINUE possibilities.

Easier method is to always sequence the first tick in realtime on MIDI START.

--

Phase shift is eliminated...

yatli commented 4 years ago

Which measurement reading?

The time between two pin toggles.

jmamma commented 4 years ago

I was able to measure the non-deferred/realtime sequencer completion time vs the transmission time of the deferred buffer.

The deferred buffer transmission time is always constant, whereas the non-deferred/realtime sequence competition time fluctuates (as seen in our earlier measurements).

I then postulate that the fluctuations in realtime sequencer performance would only be relevant if they exceed the MIDI message transmission time, as dictated by the port speed.

In other words, if the sequencer code for one event is fast enough that it completes before the MIDI message is transmitted, then the effects of sequencer jitter (variations in completion time of seqeuncer events) would be negated.

yatli commented 3 years ago

I think we should change merge target to dev? piano_roll has been merged.