jkotlinski / durexforth

Modern C64 Forth
Other
230 stars 28 forks source link

`\` broken when compiling from `v` #545

Closed ekipan closed 1 year ago

ekipan commented 1 year ago

Using the latest build cartridge, pasting the following program into v and pressing F7 won't compile the bar word.

: foo ;
\ test
: bar ;

The standard says \ discards the rest of the parse area, so if I understand correctly that v compiles with evaluate then \ comments no longer work.

Maybe it's rude to piggyback but an unrelated issue: the program ( test ) leaves 2 junk values on the stack. Looks like you need a 2drop in the definiton of (.

ekipan commented 1 year ago

IMHO this seems like a problem with the standard that makes evaluate less useful. Seems there was some discussion.

Edit: The commit that made \ conforming and broke v's F7.

jkotlinski commented 1 year ago

( test ) leaves 2 junk values

Thanks, nice catch, I made the fix as suggested!

I will have to look into fixing v evaluate, too. That is a deeper problem that I must have forgotten about.

jkotlinski commented 1 year ago

Hopefully it will be relatively useful again after c09c3c91957aa6beae928dc8fb80fd5db54197da , by which v F7 calls EVALUATE line-by-line.

ekipan commented 1 year ago

Looks fixed to me :+1:. Nice work and thanks.

I still think the standard should fix this problem with \ though. I wonder how many programs out there depend on its current interaction with evaluate?

ekipan commented 1 year ago

Maybe I spoke too soon.

: xx  [ :- iny,
-branch bne, ] ;

Pasting and F7'ing this program doesn't compile xx, it gives what looks like a "word not found" error with random looking text, perhaps from outside the buffer? It compiles if you Join the lines together. Probably it can be reduced further, not sure of the cause.

Edit: ~I guess it could also be an unrelated bug.~ I just tested with the older version in the opening post and xx compiled.

Edit: Oh, it's probably that evaluate-buffer does some work on the pstack, so any program that affects the stack between lines while compiling will break, as :- does here. Only feasible fix I can think of is to make \ non-standard and have it scan for newlines instead of dump the whole buffer. The main point of contention is that unlike refill it's not supposed to close the input file (I'm not actually sure why this matters).

I guess you could also make it do all its work in variables local to v. Maybe a scan-next-line ( -- c-addr u flag ) word or something.

jkotlinski commented 1 year ago

Should be fixed in 05eded9e30bfde1681b5c72ccdaa36c4932d6a90 . Very luxurious with debugging help, thank you :-)

ekipan commented 1 year ago

Here's a broken program:

.( a)
0 >r \ Edit: whoops, wasn't what I intended
.( b)
r> .

Prints just a ~on the previous version, jams the new one.~ and seems to do some random things, the tape indicator in VICE turned green.

ekipan commented 1 year ago

Untested off-the-cuff:

variable scan-base
: scan-line-or-quit ( -- addr u )
scan-base @ dup begin dup c@ case
 0 of 2drop quit endof
lf of dup 1+ scan-base ! over - exit 
endof endcase 1+ again ;

: main-loop
\ ...
\ f7
dup $88 = if 2drop cleanup rom-kernal
bufstart scan-base ! begin
scan-line-or-quit evaluate again then
\ ...

Edit: ~I guess the 1+ at the beginning of the loop is fine. My dumping of $a001 shows there's a newline there even for an empty file, I'm just overly cautious because I don't understand (haven't read) much of v.~ I think 1+ needs to be at the end or else the scanner will pass the end-of-buffer for a multiline file?

jkotlinski commented 1 year ago

That r> program would crash most Forths very hard :-) It cannot be expected to work.

mån 17 apr. 2023 kl. 15:52 skrev ekipan @.***>:

Untested off-the-cuff:

variable scan-base : scan-line-or-quit ( -- addr u ) scan-base @ dup begin dup c@ case 0 of 2drop quit endof lf of dup 1+ scan-base ! over - exit endof endcase 1+ again ;

: main-loop\ ...\ f7 dup $88 = if 2drop cleanup rom-kernal bufstart scan-base ! begin scan-line-or-quit evaluate again then\ ...

— Reply to this email directly, view it on GitHub https://github.com/jkotlinski/durexforth/issues/545#issuecomment-1511394270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY34O3E3QMGMSJ3MWAV43LXBVDIRANCNFSM6AAAAAAXAIQWYI . You are receiving this because you commented.Message ID: @.***>

ekipan commented 1 year ago

Probably you're talking about the pre-edit version where I took from r r> instead of putting there as I intended. I would expect the >r program to work.

Edit: Actually does the standard say anything about user programs mutating the rstack? Should the interpreter be expected to cope with that? I don't actually know.

ekipan commented 1 year ago

3.2.3.3

[...] A system may use the return stack in an implementation-dependent manner during the compilation of definitions, [etc ...]

So I guess even with >r the above program would be considered environmentally-dependent?

ekipan commented 1 year ago

I've thought some more and realized my scan-line-or-quit is essentially the same as your evaluate-buffer just with:

  1. quit and evaluate swapped places (not really useful)
  2. 1+ moved inwards and thus duplicated (needed because of point 1?), and
  3. saving its temporary in a variable instead of the rstack.

Point 3 means it supports more programs but if I understand the standard isn't strictly required.

jkotlinski commented 1 year ago

The standard says that r> >r interpreter behavior is undefined. There is no reason to expect that they would work in the interpreter.

ekipan commented 1 year ago

Implementation-defined*, not undefined. DurexForth may choose whether or not to support such programs and still be standard.

Again, if I understand correctly. So yes the bug is fixed in terms of standard requirements.

jkotlinski commented 1 year ago

The exact standard phrasing for these words is: "Interpretation semantics for this word are undefined." So, in contrast to implementation-defined, it really means all bets are off and you might expect it to crash. There are good technical reasons for why it would crash, too.

That v F7 did not crash was more or less by accident, and making it crash, will in fact make it behave more like the interpreter. I see that as a good thing.

ekipan commented 1 year ago

I see, I hadn't checked >r for whatever reason, just went from the beginning and found "return stack" in 3.2.3.3. So I did not understand correctly.

ekipan commented 1 year ago
.( a)
(
.( b)
.( c)

In v v4 F7ing this program prints ac, with this bugfix it prints abc.

jkotlinski commented 1 year ago

It is not a great behavior, but it is as expected. ( is only supposed to grab multiple lines when parsing from a text file.

ekipan commented 1 year ago

asm base float gfx labels open see sin sys timer

These files on disk contain multi-line ( comments, as well as viceutil with a multiline .( use, though gfx does have some wide lines that v would truncate. I assumed vs F7 was intended to behave like included so one could experiment with and compile files on disk. Maybe that assumption was wrong. That's how I've been using it.

jkotlinski commented 1 year ago

I think the simple fix would be to get rid of multi-line .( and ( code.

ekipan commented 1 year ago

Yeah. A little sad though, it's a convenient feature.

jkotlinski commented 1 year ago

If you miss it enough, you could create a new issue ”v F7 does not support multi-line comments”. I cannot promise a fix though, it is a difficult problem.

tis 2 maj 2023 kl. 23:04 skrev ekipan @.***>:

Yeah. A little sad though, it's a convenient feature.

— Reply to this email directly, view it on GitHub https://github.com/jkotlinski/durexforth/issues/545#issuecomment-1532148873, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY34O5WQP4NNL3Y2DARVOTXEFZF5ANCNFSM6AAAAAAXAIQWYI . You are receiving this because you commented.Message ID: @.***>

ekipan commented 1 year ago

Aargh if only \ were specified to only parse until end-of-line when evaluating! Then there wouldn't actually be a problem! I feel like if included introduces the concept of "individual lines" then \ not dealing with them is a flaw.

Edit: I replied as such on the standard but my comment is still awaiting review.

jkotlinski commented 1 year ago

I do not think there is anything wrong with \. I think the fundamental shortcoming is that EVALUATE does not handle multi-line strings well. It looks like this shortcoming is also recognized within the wider Forth community.

If a fix for this problem is needed, it would be to switch back to a non-standard EVALUATE which handles multi-line strings well. Either that, or changing ( comments to \ comments :-)

jkotlinski commented 1 year ago

Also: With a block-based text editor, this would not be a problem, as it would use LOAD rather than EVALUATE.

ekipan commented 1 year ago

\ already is specified to skip ahead an implementation-defined block line length when blk is not 0, so this wouldn't be a problem even if load were implemented with evaluate. When blk is 0 then \ should skip to the end-of-line as defined by included and I don't understand why you would disagree.

jkotlinski commented 1 year ago

Are you referring to the discussion at forth-standard.org , in the ( comment section? That came out really weird. It looks like I am replying to your comment, but it had not been published yet when I wrote my reply - my reply was directed to Anton.

jkotlinski commented 1 year ago

To answer your proposal.

When blk is 0 then \ should skip to the end-of-line as defined by included and I don't understand why you would disagree.

I agree that this is a desired end result. However, since this is a general problem, I would like to solve it in a way that helps more words than just \.

For example, imagine this particular naive \ implementation: : \ REFILL ; It would be nice if also such words worked well with v F7.

ekipan commented 1 year ago

Are you referring to the discussion at forth-standard.org

No, I was directly replying to your previous comment. How about this:

: file\ $d parse 2drop ;
: block\ ( round up >in to next block line ) ;
: \ blk @ if block\ else file\ then ; immediate

You can't use refill since that may change the input source, and when used in a block is specified to get the next block.

since this is a general problem, I would like to solve it in a way that helps more words than just \.

Other than \ included read-line does the standard define any words that care about lines?

jkotlinski commented 1 year ago

Sure, it is very likely that I have the wrong idea, it happened many times before. It does not seem smart to try to unify REFILL, given the difference between Blocks and Files.

Assuming that it is only a problem of \. I suppose the needed change to the \ definition is: "If SOURCE-ID is -1, parse and discard the portion of the parse area corresponding to the remainder of the current line" ?

jkotlinski commented 1 year ago

reopening as the solution does not seem satisfactory.

ekipan commented 1 year ago

IMO it should probably just do it regardless of SOURCE-ID. At the keyboard or from an INCLUDED file there's only one line anyway so it doesn't matter. That leaves EVALUATE and LOAD. The Block-extended \ says that if BLK is not zero \ should "discard the remainder of the line," where I think that's defined by the line length here, I say there should be a File-Access-extended \ that does $d parse 2drop in the case of durexForth, presumably parsing until the line-terminator described by READ-LINE, if that differs for a different Forth.

jkotlinski commented 1 year ago

Sure, that makes sense to me at least.

ekipan commented 1 year ago

Attempting previously to compile z from within v left me in a bad state with an unresponsive interpreter but the latest fix seems to compile it correctly.

ekipan commented 1 year ago

I found one edge case:

.( a)
\
.( b)
.( c)

No space after the \, in v4 this gives "abc" but with the fix gives "ac". I'm personally willing to not care about this edge case though.

jkotlinski commented 1 year ago

That seems very wrong though.

Whammo commented 1 year ago

Long lines are tricky. The line link table is mysterious.

ekipan commented 1 year ago

Hrrmnm perhaps this? : \ -1 >in +! $d parse drop drop ; Is there any possibility this could underflow the input buffer?

jkotlinski commented 1 year ago

I do not think there is anything wrong with the proposed \ definition. Something else is wrong.

OK, I understand now what is going on.

: \ -1 >in +! $d parse drop drop ;

That looks like a bit of a hack. What if : x 0 >in ! postpone \ ; ? OK, it is not super likely that someone would write code like that, but at the same time, there is no reason that it should cause undefined behavior.

ekipan commented 1 year ago

: \ >in @ 1- 0 max >in ! $d parse drop drop ; This is definitely more subtle than I initially thought.

jkotlinski commented 1 year ago

I am not a fan of that \ definition. I think I changed my mind back now. I am quoting Mitch Bradley, who designed the Forth input system:

"[...] the standard specifies an input model that is intended to be strictly line-oriented - the input buffer contains exactly one line."

"[...] reading the entire file into one big buffer is certainly feasible - I do that myself in some scenarios - but to be compliant with the standard as I understand it, REFILL needs to look for the next line delimiter and arrange for the input buffer to contain just the one line."

So... I now again think it makes sense to change REFILL and ( to properly deal with multi-line EVALUATE strings. REFILL should split up the EVALUATE string line-by-line. ( should work just like for text files.

jkotlinski commented 1 year ago

I have a better idea, which may be the standard-compatible and nice way to go about it.

Add a word INCLUDE-RAM ( addr u -- ) which behaves pretty much like INCLUDED. It will set SOURCE-ID to a virtual file which is mapped to RAM. REFILL will include functionality to grab lines from that virtual file. End of story!

jkotlinski commented 1 year ago

I am toying now with the idea to make a creating word for defining custom refill behavior...

e.g. -2 REFILLER ... ; would define the behavior when REFILL gets an input line and SOURCE-ID is -2. The created code would have the signature ( -- FALSE | C-ADDR U ); it should either return FALSE or the next line of the input source.

To include the custom -2 input source, one would call -2 INCLUDE-FILE.

With that, it would be possible to create a custom REFILL behavior specifically for including text files placed in a large RAM buffer.

I am also tempted to just hardcode a new INCLUDE-RAM method, maybe that is less work.

Besides that, the original option remains, which is to simply avoid multi-line ( comments :-)

jkotlinski commented 1 year ago

Adding INCLUDE-RAM was not a lot of fuss at all. https://github.com/jkotlinski/durexforth/pull/556

EDIT: it does not seem to work now, though. I am sure I had it working at some point. Fixed.