jkotlinski / durexforth

Modern C64 Forth
Other
230 stars 28 forks source link

Displaying Dictionary with RDIR #293

Closed Whammo closed 3 years ago

Whammo commented 3 years ago

If each Dictionary entry were 6 bytes followed by a null terminated string, and the last entry terminated by two nulls, It would display quite nicely, printing bytes 3 and 4 as blocks followed by the string.

burnsauce commented 3 years ago

You mean that by altering the dictionary header structure, it would display better using RDIR?

I think that's what you're saying here. This change would expand the dictionary header size and complicate header navigation. Besides, there is dowords. Have a look at how small words is, which is what RDIR would essentially replicate:

: (words) more name>string type space 1 ;
: words ['] (words) dowords ;

Removing these few bytes from the dictionary would be complemented with an increase of ~4 bytes per word. I can't see this being a welcome change.

Whammo commented 3 years ago

I appreciate you entertaining the notion. You're right of course.

Whammo commented 3 years ago

Not as much overhead per entry as I had supposed.

burnsauce commented 3 years ago

Current overhead is STRLEN + 3, FYI. Couldn't be tighter. :)

There are 356 words in the distribution dictionary, so a 4 byte increase would be 1.4kB!

Whammo commented 3 years ago

I totally get it- you'll find I'm generous with bad ideas. :)

burnsauce commented 3 years ago

I totally get it- you'll find I'm generous with bad ideas. :)

Hey, me too! :)

jkotlinski commented 3 years ago

You mean STRLEN + 2, right?

What could possibly be done further is to compress strings. A common optimization is to not actually store all characters of the word. Then you store the string length + e.g. up to 8 first characters in a word name. I did not find this optimization particularly valuable in practice, but it's an option. Another option is to hash the word name using e.g. CRC-16. I don't like that it makes debugging harder, but it's possible and might speed up wordlist searches too.

burnsauce commented 3 years ago

STRLEN + 3: Count byte, String, XT.

jkotlinski commented 3 years ago

Ah, yes. XT is two bytes :)

Whammo commented 3 years ago

Interestingly enough, I have an implementation of CRC 16 in both forth and 6502 that I've used for PPP in the past.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Johan Kotlinski notifications@github.com Sent: Monday, November 2, 2020 12:35:15 PM To: jkotlinski/durexforth durexforth@noreply.github.com Cc: Whammo kev@bgne.com; State change state_change@noreply.github.com Subject: Re: [jkotlinski/durexforth] Displaying Dictionary with RDIR (#293)

Ah, yes. XT is two bytes :)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/jkotlinski/durexforth/issues/293#issuecomment-720709517, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJEFWZ5DOE4L3JWD6MZ65PDSN4JYFANCNFSM4THFSRCQ.

jkotlinski commented 3 years ago

Actually, biggest problem with hashing word names is that then WORDS won’t work.

On Tue, 3 Nov 2020 at 02:18, Whammo notifications@github.com wrote:

Interestingly enough, I have an implementation of CRC 16 in both forth and 6502 that I've used for PPP in the past.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Johan Kotlinski notifications@github.com Sent: Monday, November 2, 2020 12:35:15 PM To: jkotlinski/durexforth durexforth@noreply.github.com Cc: Whammo kev@bgne.com; State change state_change@noreply.github.com Subject: Re: [jkotlinski/durexforth] Displaying Dictionary with RDIR (#293)

Ah, yes. XT is two bytes :)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub< https://github.com/jkotlinski/durexforth/issues/293#issuecomment-720709517>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AJEFWZ5DOE4L3JWD6MZ65PDSN4JYFANCNFSM4THFSRCQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jkotlinski/durexforth/issues/293#issuecomment-720823035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY34O3TLHQSBPOSOEECTP3SN5K5JANCNFSM4THFSRCQ .

Whammo commented 3 years ago

Hmmmm, the wordlist and markers could be offloaded to disk. I wonder what else could be relegated to ‘virtual memory’?

burnsauce commented 3 years ago

Wordlist can't go on disk for the cartridge build :(

jkotlinski commented 3 years ago

Also - what about new words!

Whammo commented 3 years ago

Yes wordlist on disk is problematic for the cart build. Perhaps a better cart is on the horizon.

New words names would be appended to disk. In memory, the compliment of the checksum of the name of the new word and the likewise checksums of previous words names are appended to the list of checksums, each entry of which is an index to it's execution token.

To see if a word exists, init the CRC, calculate over the name, then calculate the list of checksums checking each for a good CRC. If you get to the end of the list without a match it's a new word.

burnsauce commented 3 years ago

Sounds like a job for the little-used REL file.

Also sounds like the cart forth kernel will diverge from the disk build somewhat.

You thinking about putting something like this together @Whammo?

Whammo commented 3 years ago

I'm certainly enjoying rolling it around in my head! Collisions are a real though remote possibility too. I can't account for that yet. This would also introduce the need for disk discipline and hygiene, shutting down properly and repair. Icky. I had in the past considered replacing the unused load address in the .prg files with a CRC-16 checksum of the file, but never really explored the possibility.

Ahh- if you know it's a new word, then you're checking for collision as you're running it's checksum. If you don't make it to the end- then it's a collision. So you already know it's a new word because : create :code etc.

Whammo commented 3 years ago

Not to beat a dead horse, but I did misspeak about the format of the RDIR. It's four bytes: line link word, blocks word, followed by the null terminated string. Still an extra two bytes, even if it removes the need for length.