Closed sanni closed 1 year ago
This is likely caused by a3892f6d05fccc81605159d057caae00653b3a13: the compiler disregarded inline
so there are 2 extra calls per iteration. I guess a better change would be to make these macros, then the compiler has to obey the inlining, and at least there is only one location where the CRC code is implemented.
I pushed a few fixes, bringing the time to dump an 8MB cart from 64s to 52s: da6923ea85e55ccdd5191d06523fd01fde635591, 1c303eefec0156b8572270f5e741b0b6d6173408 , da46363e1ee0e399bad20a99ed8e77369aabaed9
But this is still not back to what I get with my current upstream base (3b0a046d79e4b804d306fc774c796e48073299e8): 48s.
If I cheat I can shave off 2 seconds from that time by improving code I had not yet modified (get_line
), but this does not address the 4s/10% loss - which I still fail to identify.
I guess the next best thing would be to bisect my changes, but this likely requires that I fold all the fixups into their original commit, rewriting history. Is it ok for you ?
Sure.
This has been a pain, but I found it:
updateCRC
(before my changes) stores the computed table index into a 32bits variable masked to 8 bits, unlike the inline N64 version which stores it into an 8 bits directly.
Beware: the trim branch now does not contain most of my N64 work. So if you closed this issue by re-testing this branch, the issue would appear as fixed but for the wrong reason.
I am still trying to sort out my N64 changes into reasonable commits, as I have done a lot of changes, and the intermediate commits step on one another:
$ git diff --stat vpelletier/trim..trim N64.ino
[...]
1 file changed, 285 insertions(+), 880 deletions(-)
It's even faster than originally now 🚀
Saving to N64/ROM/DONKEY KONG 64/617/...
[*******************]
CRC32... 919F7E74 -> Donkey Kong 64 (Japan).z64
Done (166s)
Awesome, thanks for rechecking.
fastcrc is enabled by default on HW4/5 but can also be enabled on HW3 in line 118.
10.4
10.4 Trim