Closed yatli closed 4 years ago
https://1drv.ms/x/s!Aql6jSQH9uyZhP1RR_0UWsqnGg_gVw?e=cIIBuV fits nicely with the theory.
Unfortunately we still use glcdfont.c. That's our default "large font"
It's selected using: oled_display.setFont();
i.e
./Adafruit_GFX.h: setFont(const GFXfont *f = NULL),
void Adafruit_GFX::write(uint8_t c) {
#endif
if (!gfxFont) { // 'Classic' built-in font
ah ok. reverted.
now 198792 bytes (78%)
ah ok. reverted.
yeah I got excited for a minute there...
Can you clarify
ALWAYS_INLINE vs FORCED_INLINE.
I see you've defined IS_ISR_ROUTINE in select files.
That's the interesting part.
ALWAYS_INLINE is only effective if IS_ISR_ROUTINE is defined, so for any function defined as such, it automatically builds two versions, inlined and non-inlined.
FORCED_INLINE disregards IS_ISR_ROUTINE and always inline.
This is to preserve the critical ISR inlining while freeing up space from the other routines, without writing another copy of the functions.
This is to preserve the critical ISR inlining while freeing up space from the other routines, without writing another copy of the functions.
yep, awesome.
Could you do some regression testing. I'm not exactly sure the effect of un-inlining things in non critical paths. The most significant changes are shown in the spreadsheet in yellow so that may give you a rough idea.
Could you do some regression testing. I'm not exactly sure the effect of un-inlining things in non critical paths. The most significant changes are shown in the spreadsheet in yellow so that may give you a rough idea.
Yeah, my anxiety levels already started rising .
Originally I had intended to inline every call present in the Sequencer ISR path. Which is everything indirectly called by mcl_seq.seq();
So inline all the MD/ExtSeqTrack.seq() calls, including the MidiUart.note_on and CC calls which are called indirectly from MD.triggerTrack() , MD.setTrackParam() etc;
This reasoning behind this was because I was trying to cram the entire sequencer run time within a single ISR event. However, this didnt work in practice. The ISR takes too long to complete before another ISR is triggered, resulting in missed MIDI clock events. So we end up having to re-enabling interrupts anyway.
223 ALWAYS_INLINE() void callCallbacks(bool isMidiEvent = false) {
224 if (state != STARTED)
225 return;
226
227 static bool inCallback = false;
228 if (inCallback) {
229 DEBUG_PRINTLN("clock collision");
230 return;
231 } else {
232 inCallback = true;
233 }
234
235 #ifndef HOST_MIDIDUINO
236 sei(); <--- Interrupts re-enabled before sequencer callback is executed.
237 #endif
238
239 on192Callbacks.call(div192th_counter);
240
241 if (isMidiEvent) {
242 on96Callbacks.call(div96th_counter);
243
244 if (mod6_counter == 0) {
245 on16Callbacks.call(div16th_counter);
246 on32Callbacks.call(div32th_counter);
247 }
248 if (mod6_counter == 3) {
249 on32Callbacks.call(div32th_counter);
250 }
251 }
252
253 inCallback = false;
254 }
Midi/MidiUartParent.hh contains the low level MIDI calls that I wanted forced inlined.
@yatli Are you able to objdump mcl_seq.seq() to see how many subroutines are being executed and compare old code with new?
seq() size change:
Current result shown on the right. If totally un-inline everything the result is shown in the middle. So it's kind of a balance
I guess we need to benchmark execution times to make an informed decision here.
My rough estimation is that anything bigger than 10 instructions won't be impacted too much if it's pulled out
My rough estimation is that anything bigger than 10 instructions won't be impacted too much if it's pulled out
What's your reasoning here?
A call is 5 cycles, a ret is 5 cycles. Most instructions are 1 or 2 cycles, so calling into a 10 instruction routine results in 50% speed. As the number of instructions increase, the benefits of inline diminishes. The speed approximates 100%
A call is 5 cycles, a ret is 5 cycles. Most instructions are 1 or 2 cycles, so calling into a 10 instruction routine results in 50% speed. As the number of instructions increase, the benefits of inline diminishes. The speed approximates 100%
understood.
Tomorrow I will hook up the scope and profile execution times for seq(). Going to continue on piano_roll for now.
Just started profiling the performance.
Test scenario.
+ SET_BIT(PORTB, PB5);
on192Callbacks.call(div192th_counter);
-
+ CLEAR_BIT(PORTB, PB5);
Test branches dev vs size_reduce_2 using same pattern.
Measurement oscilloscope at 500us horizontal scale.
Both dev and size_reduce_2 appear to perform similarly.
1.1 x 500us shortest completion time (0.5ms)
4 * 500us worst case (2ms)
--
From the scope i'm unable to discern any notable difference between the two.
atmega2560 runs at 16MIPS / 16MHZ
therefore it can perform 16000 instructions/ms
@yatli
Given that these time scales are orders of magnitude larger than our fastest timer clock++
and that ISR is enabled before on192Callbacks.call We can can actually use the megacommand timer to measure execution times.
At 128BPM
B = 1 beat is ((1/(128 / 60)) = 0.46875 seconds long
Edit: calcs were wrong.
(B / 4) / 6 = 19.5ms per MIDI_CLOCK
9.75 ms per div192.
I guess if we wanted to tighten up the sequencer we could pre-calculate the MIDI payload prior to the clock pulse. and then memcpy in to the midi tx buffer on clock (or swap pointer to buffer on clock..).
Does this mean the size reduction doesn't impact performance?
scope is not revealing anything abnormal between the two.
we should double check with timer method as I was just using my :eyes: and relying on the refresh rate of the digital lcd.
@yatli can I leave that up to you to check? serial.print + excel spreadsheets :D . You'll need to store ISR execution times in RAM, and then print during main program loop (outside of ISR)
Sure! Benchmarks heh. All you need to write papers (
https://1drv.ms/x/s!Aql6jSQH9uyZhP1T8-w7P7Ru-RtZag?e=6Sa4eI
Not much of a difference.
The two compared branches:
https://github.com/jmamma/MIDICtrl20_MegaCommand/tree/benchmark
https://github.com/jmamma/MIDICtrl20_MegaCommand/tree/dev_benchmark
I find debugging more enjoyable on DiagnosticPage than serial port(
microseconds ?
yes
more notes = more jittering.
blocking on uart?
Jitter as in ISR completion jitter?
Yeah one possible cause could be the UART transmit interrupt triggering more often because of more data on the buffer. So you'd get a bit of thrashing between the Seq ISR and UART transmit ISR.
Jitter as in ISR completion jitter?
Yeah one possible cause could be the UART transmit interrupt triggering more often because of more data on the buffer. So you'd get a bit of thrashing between the Seq ISR and UART transmit ISR.
Yeah. The benchmark measures the time of div192 callback. And the sampled time is drawn on screen periodically. I then hit play/stop multiple times to get the readings. The "jittering" happens like in 1 of 10 times.
I think this is good to merge now.
Yes, I didn't notice any issues in testing.
Will merge now.
A continuation of https://github.com/jmamma/MIDICtrl20_MegaCommand/pull/108
Cherry-picked the most significant items:
Size goes down significantly, from 85% (216718) to 77% (197232) 19486 bytes saving!