uho / preForth

a minimalistic Forth kernel that can bootstrap
GNU General Public License v3.0
72 stars 9 forks source link

Self hosting tokenizer #9

Open nickd4 opened 2 years ago

nickd4 commented 2 years ago

More hacking... what I set out to do was to make the seedForth tokenizer self-hosting, so that after bootstrap you would not need gForth to develop applications. So my idea was make the tokenizer work in gForth like now (for bootstrapping) and also work in seedForth interactive version (for application development). It turned out to be quite difficult, but ultimately it works.

So the actual changes to seedForth-tokenizer.fs to make it run under seedForth were not that huge, mainly a matter of accounting for seedForth's case sensitivity and restricted syntax for hex and character literals and various things like that, as well as minor differences in the words available (parse-name instead of <name> etc). But the larger difficulty was in making a seedForth or seedForthInteractive program run cleanly as a filter. I had to modify the runtime library and I/O system a lot.

There was also another issue to deal with which concerns the wrapping of the *.seed and *.seedsource files. Originally the input was wrapped in PROGRAM / END and the output was wrapped with an automatic bye token added at the end. I have removed the need for all of this wrapping, at the cost of its being slightly more awkward to invoke the gForth version of the tokenizer. Since this is only done from the Makefile during bootstrap, that's not a big deal. It's only just occurred to me now that the unusual extension *.seedsource was probably due to the wrapping, so maybe we can rename them to *.forth now?

Here is a detailed summary of all the changes I have made to support the self-hosting tokenizer:

Some of the more detailed changes might not be well explained in the above summary, or might be objectionable for whatever reason, so please feel free to check with me. Also, keep in mind that this changeset is "on top of" the previous changeset that I PR'ed the other day, so github will show both changesets. It's annoying the way github does this, and it does not recalculate the changeset after you merge the first PR. But you can force it to, by changing the base branch name and then changing it back.

I had a really good time doing this, even though it involved a lot of head-scratching and dealing with strange crashes and errors and unexpected behaviour. As I mentioned I'm not an experienced Forth programmer, but I've become more conversant with it.

Note: There is a minor bug in this PR, that I had directly invoked gforth in Makefile instead of $(HOSTFORTH). It is fixed in #12 so I have not fixed it here. If you do want the fixed version of this PR see the branch self_hosting_tokenizer1 in my github account. I wouldn't recommend using that branch though, because it will cause conflicts later when mergining #12 and others.