jkotlinski / durexforth

Modern C64 Forth
Other
230 stars 28 forks source link

v3.0.0 slow builds #400

Closed jkotlinski closed 2 years ago

jkotlinski commented 2 years ago

I tried building DurexForth from scratch, using VICE emulator with Warp mode + True Drive Emulation. Comparing v2.0.0 and v3.0.0, build time went up from 20 seconds to 35 seconds! This is not good. It would be helpful to measure where the time is spent. Maybe there is a way to bring build times down again.

Whammo commented 2 years ago

Assembly is larger source but compiles smaller? LOADB to v buffer and compile from there?

jkotlinski commented 2 years ago

It looks like the slowdown is from commit 16c73afdfc97499283581c0907a3b8e65fcd26e1 .

I would guess a lot of the problem is because durexforth.prg is ten times larger now - it went from ~4 KB to ~40 KB. The initial load takes a lot longer now, it is especially noticeable on an unaccelerated 1541.

jkotlinski commented 2 years ago

OK... I think the bigger issue is that my VICE was misconfigured, so that I had to run with "True Drive Emulation" enabled. Disabling TDE helps a lot, and should be used as default.

Even so, the build time is increased from ~7 seconds (2.0.0) to ~9 seconds (3.0.0). I think it might be worthwhile investigating if there is some low hanging fruit, to win back those lost seconds.

burnsauce commented 2 years ago

It was obvious at the time, and I've been thinking about it since, but the issue lies with searching the dictionary. The additional maths required to search the dictionary structure is the cause of this slowdown, I'm pretty sure.

One thought: while build time is currently extended by this problem, it doesn't have to extend to runtime. Sorting the dictionary at build time, to put the most common words at the start of the dictionary, may significantly improve runtime lookup performance.

To address build-time performance, it may be worth performing this sort at assembly time, through some wizardry. An alternative would be to build the dictionary backwards at assembly time, assuming that earlier compiled words are more common than later ones.

Short of those things, a change to the dictionary structure (e.g.: to make it larger by including a link to the next dict entry) would be required to improve this aspect of performance.

burnsauce commented 2 years ago

Instead of reverse sorting, the dictionary could actually be searched in reverse order. This was the original design I had, but reversed it so that names in the dictionary were counted strings. A reverse entry would be:

[xt][name][strlen]

A bit clumsy, and would require the reworking of everything that references dictionary entries by counted string.

jkotlinski commented 2 years ago

For the correct operation though, in case of multiple words with same name, it’s required that the most recent word is found first?

On Mon, 3 Jan 2022 at 17:53, Poindexter Frink @.***> wrote:

Instead of reverse sorting, the dictionary could actually be searched in reverse order. This was the original design I had, but reversed it so that names in the dictionary were counted strings. A reverse entry would be:

[xt][name][strlen]

A bit clumsy, and would require the reworking of everything that references dictionary entries by counted string.

— Reply to this email directly, view it on GitHub https://github.com/jkotlinski/durexforth/issues/400#issuecomment-1004221797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY34O43OIP7VUQ2L24ODY3UUHIBLANCNFSM5LEHDL3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

burnsauce commented 2 years ago

Damn, you're right. Wasn't thinking about the standard at all!

jkotlinski commented 2 years ago

74418b5b2d70f3bf25300dbbc795dc8d3e37cb25 <-- reduced build times by about a second. Maybe there's more to get in other places.

Also, reverting 530a1b6acd50cfa2547be7f78e53dfa55557c709 would reduce build time by about 0.3 seconds or so (hand-measured). But I'm not a fan of static buffers all over the place.

jkotlinski commented 2 years ago

Ok, brought back the FIND_BUFFER and now it's only 0.5 seconds slower than v2.

burnsauce commented 2 years ago

Excellent work! It's clear on paper that a lot of cycles are saved here with the use of indexing and the stack.

Sometimes I have to stop thinking big picture and look at the code. My mind was stuck conceptually when I saw the increase in compile times at the split dictionary commit, so I never tried optimization >.< I just wrote it off as a design flaw. Lesson learned, hopefully!

Again, great work :)

jkotlinski commented 2 years ago

With 2ac05aaf3f2765116b943a4dc76af202add668db , it seems master is now faster than v2 :) (with True Drive Emulation disabled)

Thank you again for the excellent split dictionary contribution!