Closed vpelletier closed 2 years ago
I'm not a programmer, I only read the first ~70 pages of Kernighan's "The C programming language". Stopped reading at pointers. So the whole code base is certainly missing some optimization.
I guess the strings are not getting merged by the compiler because they are stored in the program memory instead of the SRAM since we are even lower on SRAM than on ROM. There is a solution posted here: https://www.arduino.cc/reference/en/language/variables/utilities/progmem/ Could just use an array of strings in progmem for the most common expressions.
I'll try it out and report back.
I only read the first ~70 pages of Kernighan's "The C programming language".
That's still 70 more than I did :) .
<off-topic>BTW, thank you very much for this project. I had original N64 batteries which were magically still hanging on, keeping game saves alive after all this time. I could dump the saves, replace the batteries and write them back.</off-topic>
I've tried disabling the -f{function,data}-sections
arguments to avr-gcc and avr-g++, with no change to the output.
I realised that avr-gcc is version 5.4.0, which is getting quite old now. I have no idea if constant merging has been improved since, but it makes it unlikely that anyone on the gcc side would be willing to take a look at why it does not work as expected.
Such automated de-duplication would be significantly more convenient than having to maintain a "list" (at least in the general sense, if not in the technical "array" sense) of strings in order to save space, so it is unfortunate it does not work.
It does seem to work, just with the default modules enabled and two strings replaced I already get results: Default -> Sketch uses 230230 bytes (90%) of program storage space. "Reset" replaced -> Sketch uses 229518 bytes (90%) of program storage space. "Reset" + "Press Button..." replaced -> Sketch uses 228386 bytes (89%) of program storage space. ten most used strings replaced -> Sketch uses 225798 bytes (88%) of program storage space.
I just replace
println_Msg(F("Press Button..."));
with new function
print_STR(press_button_STR, 1);
/******************************************
Common Strings
*****************************************/
#define press_button_STR 0
#define sd_error_STR 1
#define reset_STR 2
#define did_not_verify_STR 3
#define _bytes_STR 4
#define error_STR 5
#define create_file_STR 6
#define open_file_STR 7
#define file_too_big_STR 8
#define done_STR 9
// This arrays holds the most often uses strings
static const char string_press_button0[] PROGMEM = "Press Button...";
static const char string_sd_error1[] PROGMEM = "SD Error";
static const char string_reset2[] PROGMEM = "Reset";
static const char string_did_not_verify3[] PROGMEM = "did not verify";
static const char string_bytes4[] PROGMEM = " bytes";
static const char string_error5[] PROGMEM = "Error:";
static const char string_create_file6[] PROGMEM = "Can't create file";
static const char string_open_file7[] PROGMEM = "Can't open file";
static const char string_file_too_big8[] PROGMEM = "File too big";
static const char string_done9[] PROGMEM = "Done";
static const char* const string_table[] PROGMEM = { string_press_button0, string_sd_error1, string_reset2, string_did_not_verify3, string_bytes4, string_error5, string_create_file6, string_open_file7, string_file_too_big8, string_done9 };
void print_STR(byte string_number, boolean newline) {
char string_buffer[18];
strcpy_P(string_buffer, (char*)pgm_read_word(&(string_table[string_number])));
if (newline)
println_Msg(string_buffer);
else
print_Msg(string_buffer);
}
It does seem to work, just with the default modules enabled and two strings replaced I already get results:
Cool !
For reference, here is the updated top-30 most duplicated strings with default modules (I am on HW3, which I believe affects the menu strings, also I am still on the same commit as above):
$ avr-objcopy -O binary -R .eeprom Cart_Reader.ino.elf Cart_Reader.ino.bin
$ strings Cart_Reader.ino.bin | sort | uniq -c | sort -nr | head -n30
62 Press Button...
40 SD Error
28 Reset
28 m/|/
27 (_?O
26 did not verify.
25 A,Q,2
24 bytes
23 Error:
22 0x`/
18 Can't open file on SD
18 Can't open file
17 File size exceeds flash size.
17 /...
15 Saving to
13 Verifying...
12 Flashing file
11 Verified OK
11 /_?O
11 Erasing...
11 Done
11 Can't create file on SD
11 /)/3'
10 Press right to select
10 Press left to change
10 Flash ID:
10 ATTENTION 3.3V
9 V2H/
9 This will erase your
9 Select file
Default -> Sketch uses 230230 bytes (90%) of program storage space. "Reset" replaced -> Sketch uses 229518 bytes (90%) of program storage space.
So 712 bytes saved.
>>> 27 * len("Reset ")
162
This is a much larger saving than I would have expected (keeping one copy of the string, the trailing space is just to represent the trailing NUL).
"Reset" + "Press Button..." replaced -> Sketch uses 228386 bytes (89%) of program storage space.
1120 further bytes saved
>>> 61 * len("Press Button... ")
976
The gain is again exceeding expectations. Not sure why, but I'm not complaining :) .
ten most used strings replaced -> Sketch uses 225798 bytes (88%) of program storage space.
4432 bytes saved ! This is great.
Now 5330 bytes with default modules, see attachment. Cart_Reader_test.zip
New top-30 with HW3 and default modules:
$ strings Cart_Reader.ino.bin | sort | uniq -c | sort -nr | head -n30
28 m/|/
24 A,Q,2
23 (_?O
17 /...
11 Verified OK
11 /_?O
11 Erasing...
11 /)/3'
10 Press right to select
10 Flash ID:
10 ATTENTION 3.3V
9 This will erase your
9 Select file
9 O__OoO
9 Blankcheck
8 Savetype Error
8 Saved to
8 Can't open file
7 Writing...
7 #Q1 #01
7 O]_O
7 File doesnt exist
7 Error: Wrong Flash ID
7 BPQ a q
7 Banks:
6 `/op
6 K-h-
6 g+h+i+
6 Error
6 Erase failed
This looks a lot cleaner: now the false-positives are taking a significant portion of the output.
I tried enabling all modules, but it still goes over the limit. And I am a bit puzzled: it goes over the limit by 4k... More than before. Maybe this is just because I was a bit behind the master branch, and something would have increased the footprint since ?
I have identified another way to gain ~500 bytes of program space (when all modules are enabled): make flashid an integer instead of a 5-bytes string. Still not enough to fit them all, but getting closer.
This saves program memory in 2 ways:
This also saves 200 bytes of ram, which I think come from the many 5-bytes strings which used to be ram-based (so they had to be copied from their initialiser value to ram during boot, so they took 200 bytes of global space).
I pushed the change to my fork (linking to the branch as I will likely push some more). EDIT: Important note: I am not testing these changes on-hardware yet, rather getting a feeling of what impact they have on memory use.
This is applied on top of the changes you posted a few comments above.
[edit] rewording
Quick progress report before logging off: I gained about 1kB of program space. 3kB (at least) to go. My current targets are duplicated code:
read()
instead of using read(buffer, length)
)I target the largest functions using information from objdump
on the produced .elf
file (so no need to disable modules to get a .bin
to analyse): avr-objdump -tC Cart_Reader.ino.elf | grep -v '\.bss' | sort -rk 1.24,1.31 | less
.
I exclude .bss
just because it fails at being sorted properly, making the output a bit confusing. Anyway these do not take program space, only ram space, and ram is fine: I went below 80% use.
But again, I have not tested this against real hardware yet, so I probably broke some things along the way. I am worried about how I can test the changes I am accumulating, as I only have N64, GB (+ GBC) and GBA gamecarts available to dump, and no flashcart.
I can test NES, SMS, MD and have GBA, GBC, SNES and N64 flashcarts. If there is anything I can test before you invest the time changing the code just tell me. Otherwise just do the changes and we test and fix afterwards.
Today's news (so far): I've focussed on NES.ino, and reduced the size by abou 1450 bytes. 300 bytes to go before a perfect fit with HW3 (but freeing a few more kB would be more than welcome). I pushed everything so far to my branch.
I would like you to start reviewing the changes, to at least point out mistakes I am making (breakages, coding style, bad directions, pointers to other places where similar changes would be applicable, ...), and maybe even pick a few commits for inclusion.
Should I open a merge request ? I will likely be force-pushing to it (hopefully only the few topmost commits), so this may cause unwanted notification noise.
I found another 848 bytes by removing all 5 optional features from the display library like explained here: https://github.com/olikraus/u8g2/wiki/u8g2optimization#u8g2-feature-selection
You have to edit the library header file as setting the "without" defines in Cart_Reader.ino doesn't have the same effect.
I have no clue what these extra features do but I didn't notice anything acting different now. The font I'm using is restricted to characters from 32 to 127 so I think we are fine.
Anyway with your trim branch and the lib edit I can enable all modules on HW5: Sketch uses 251282 bytes (98%) of program storage space. Maximum is 253952 bytes. Global variables use 6543 bytes (79%) of dynamic memory, leaving 1649 bytes for local variables. Maximum is 8192 bytes.
Anyway with your trim branch and the lib edit I can enable all modules on HW5:
HW3 fits as well.
Sketch uses 253586 bytes (99%) of program storage space. Maximum is 253952 bytes.
Global variables use 6532 bytes (79%) of dynamic memory, leaving 1660 bytes for local variables. Maximum is 8192 bytes.
I'm gonna be off for a little while but will test your fork later today. Can you enable "Issues" on your repo then we can track them there better.
I found one with a short test, the NES database browsing screen is missing the first letter of the name and also doesn't display the mapper data correctly
The first letter appears again when browsing right/forward, disappears when browsing left/back. You can test without a NES cart inserted by just selecting any letter in the manual selection screen.
Can you enable "Issues" on your repo then we can track them there better.
Done.
the NES database browsing screen is missing the first letter of the name
Nice catch, looking into this.
Nice catch, looking into this.
Found it, and very likely fixed it, but I found another issue: I cannot go back on the rom list on HW3 (double-click with left button). ...issue which was caused by my broken fix to the first issue. Hurray, first fix-on-fix.
Update: with my current trim
branch, I am at this level, with the lcd driver options disabled:
Sketch uses 252210 bytes (99%) of program storage space. Maximum is 253952 bytes.
Global variables use 6250 bytes (76%) of dynamic memory, leaving 1942 bytes for local variables. Maximum is 8192 bytes.
This means that there is now enough headroom to re-enable these LCD driver options. While the code does not use them, I believe this is hurting ease of distribution. Especially, it makes updating dependencies cumbersome.
Here is an update on the global ram space usage (.bss for globals without initialiser, .data for globals with initialiser, descending size down to 16 bytes as an arbitrary cutoff, some N64 stuff missing as this is from my development environment where I already got rid of it):
$ avr-objdump -tC Cart_Reader.ino.elf | grep '\.bss' | sort -rk 1.23,1.3
008014e1 l O .bss 00000400 buf.9722
0080126a g O .bss 00000277 .hidden sd
00800d8a g O .bss 00000200 .hidden sdBuffer
00801918 g O .bss 0000008c .hidden menuOptions
008010e0 g O .bss 00000087 .hidden clockgen
00800c6a g O .bss 00000084 .hidden filePath
00801167 g O .bss 00000083 .hidden display
00800cee g O .bss 00000064 .hidden fileName
0080122a g O .bss 00000040 .hidden myFile
008011ea g O .bss 00000040 .hidden myLog
00800c30 g O .bss 00000030 .hidden signature
008019ab g O .bss 00000026 .hidden folder
008010a6 g O .bss 00000020 .hidden TwoWire::txBuffer
0080107a g O .bss 00000020 .hidden twi_masterBuffer.lto_priv.254
00801058 g O .bss 00000020 .hidden TwoWire::rxBuffer
00801031 l O .bss 00000020 twi_rxBuffer
0080100f l O .bss 00000020 twi_txBuffer
00800d60 g O .bss 00000016 .hidden romName
00800d7a g O .bss 00000010 .hidden iNES_HEADER
[...]
$ avr-objdump -tC Cart_Reader.ino.elf | grep '\.data' | sort -rk 1.24,1.31
00800204 l O .data 0000008c menuOptionspceCart
00800325 l O .data 00000035 u8x8_d_ssd1306_128x64_noname_init_seq
0080035a l O .data 00000018 u8x8_ssd1306_128x64_noname_display_info
008003cb l O .data 00000012 vtable for TwoWire
008003b9 g O .data 00000012 .hidden vtable for Stream
008003a7 g O .data 00000012 .hidden vtable for StreamFile<FsBaseFile, unsigned long long>
008003a7 g O .data 00000012 .hidden vtable for FsFile
00800384 l O .data 00000012 CHR
00800372 l O .data 00000012 PRG
[...]
Descriptions grouped by theme.
Display:
buf.9722
is the OLED screen framebuffer, so I doubt anything can be gained here: it has to be around to be drawn on.display
: likewiseu8x8_*
(HW5 likely has different names): likewisemenuOptions
could probably be gotten rid off, but I doubt it would bring much gain in practice.SD-card:
sd
is the SdFs
object from Card_Reader.ino
, so similarly I do not think there is anything to gain here.sdBuffer
should IMHO be gotten rid of in favour of local buffers allocated at the level where block IO is actually happening. But it is used in so many places, there is still a long way to gomyFile
: likewisefilePath
: likewisefileName
: likewisefolder
: likewisemyLog
: if logging is infrequent and away from performance bottlenecks, maybe the logfile could be opened only while printing to the log.Clock gen:
clockgen
: this I have not looked in details. If this is only used to configure the clock generator board (as opposed to being needed to keep the clock running) then I think it should only be instanciated when a configuration change is being done, and this would free a significant amount of global ram.TwoWire
and twi_*
: likewise, these are involved in talking to the clockgenCart IO & cart-type-specific stuff:
signature
: Comes from GBS.ino
. I'm not sure how to get rid of this, but I think it should be removed, possibly by forcing a re-read every time gbSmartGetGames
is called.romName
: Comes from Cart_Reader.ino
, used in many places. I guess this can be gotten rid of by letting each adapter manage its own rom name. But like sdBuffer
& friends, it is used all over the place.iNES_HEADER
: Not many users, but in different parts of the source code. Still, probably easier to get rid of than most of the other globals.menuOptionspceCart
: This should be possible to get rid of, and is on the large side of things.CHR
and PRG
: Come from NES.ino
. At a glance it should be possible to move to PROGMEM
.While I think there is still more that can be done on this general flash-and-ram-usage-optimisation topic, I think this specific bug can be closed: since 11.1, I see that all modules are enabled again by default.
Note: I am completely new to arduino in general, so I may be missing something.
I cannot get the latest source (as of 9d80d2497b238fb94961bf3e62f0d96f684cf3a1) to fit with all modules enabled. On its own this may not be a big issue (I guess this is why they are optional and selectable to begin with). The linker complains that the code is 2444 bytes too large.
Then I took a look at the produced binary (after disabling the NES module just to have a file to analyse) to check what is going on. And here is an extract of what I found:
The first obvious finding is that
-fmerge-constants
(which is supposed to be enabled in-Os
, which is used in the build) is somehow doing a terrible job. I tried editing my platform.txt to explicitly add-fmerge-constants
, without any improvement.So, how much space is used by these ? Ignoring the few strings above which are likely accidental finds by
strings
and not "true" strings, I get:Of course, one copy of each strings would still be necessary, and this ignores the (low ?) probability that some of these would actually be non-constant but actually local variable initial values, so the final gain would not be strictly this much. Still, this is 3 times larger than the extra space needed to fit all modules in the available program space. Also, it obviously ignores any possible duplication from the NES module.
I do not have a magic solution. The best idea I can come up with is terrible: put all constant (including inline) strings in some header file and reference them everywhere. But at least this should not leave gcc the chance to mess things up. This will not help if these duplicate strings come from libraries, but the strings visible above seem to be rather specific to this project, so the losses from libraries seems likely to be low.