Support source texts format not only with 0DX (Mac, Oberon System) and 0AX (UNIX), but with 0DX+0AX (Windows) at end of line.

GoogleCodeExporter commented 8 years ago

In principle, Ofront already compiles text sources with 0DX+0AX, but 
incorrectly shows a line of a found error:

Test.Mod (with 0DX+0AX):

MODULE Test123;
BEGIN

END Test1231.

stef@notebook:~$ ofront Test.Mod -l
Test.Mod  translating Test123
    line 7  pos 3  err 4  

Ofront shows line 7, but the source has only 4 lines. Why is this happening? 
Module CmdlnTexts.Mod changes all 0AX to 0DX with help of code:

IF (ch = 0AX) & u(Piece).ascii THEN ch := CR END    (* << LF to CR *)

So 0DX looks as the end of line, and 0AX (changed to 0DX) looks as the end of 
another line.

What I suggest. To Ofront could compile well Windows texts as well as Unix 
texts, and I think that's true for the Unix version of Ofront too, we need to 
teach the module CmdlnTexts.Mod work with texts ended 0DX, 0AX or 0DX+0AX as 
one line, and not two.

The best solution is modify code to checks previous char. If a current char is 
0DX, it's eol. If a current char is 0AX, it's eol ONLY if the previous char was 
not 0DX.

Such algorithm will work with a variety of texts and to extend the scope of 
Ofront. Personally I save my sources encoded in Win 1251 codepage, so this 
feature will be very useful for me.

Here is the code that implements this algorithm in action:

MODULE OfrontOPM;   (* RC 6.3.89 / 28.6.89, J.Templ 10.7.89 / 22.7.96  *)

...

VAR
    prevch: CHAR;   (* previous char to calculate a line number *)

...

    PROCEDURE Get*(VAR ch: CHAR);   (* read next character from source text, 0X if eof *)
    BEGIN (* supported 0DX (Oberon), 0AX (UNIX), 0DX+0AX (Windows) *)
        Texts.Read(inR, ch);
        IF useLineNo THEN
            CASE ch OF
            | 0DX:
                    curpos := (curpos DIV 256 + 1) * 256
            | 0AX:
                    IF prevch # 0DX THEN curpos := (curpos DIV 256 + 1) * 256 END
            ELSE
                IF curpos MOD 256 # 255 THEN INC(curpos) END
                (* at 255 means:  >= 255 *)
            END;
            prevch := ch
        ELSE
            INC(curpos)
        END ;
        IF (ch < 09X) & ~inR.eot THEN ch := " " END
    END Get;

...

BEGIN prevch := 0X; Texts.OpenWriter(W)
END OfrontOPM.

This code will work only if do two changes to CmdlnTexts.Mod:

1. Remove the line:

IF (ch = 0AX) & u(Piece).ascii THEN ch := CR END    (* << LF to CR *)

2. Replace:

IF ch = CR THEN INC(S.line)

to:

IF (ch = CR) OR (ch = 0AX) THEN INC(S.line)

I implemented this algorithm so. Although I think it would be better not to 
allow return while reading from a text file both characters 0DX and 0AX. 
Instead it would be better if checking for the end of the line was inside the 
procedure CmdlnTexts.Read. And this procedure will return eol as 0DX only, as 
before, for not to handle the extra character 0AX, which will mean the same 
thing as 0DX.

In defense of my implementation, I would say that module Texts.Mod of ETH 
Oberon PlugIn returns 0AX and 0DX both.

So, of course, I wonder how will this decision look ideologically correct. and 
also considers whether Josef this proposal so important?

This suggestion is the first step to build Ofront for Windows command line. 
What do you think, Josef?

And link to the commit:

https://github.com/Oleg-N-Cher/Ofront/commit/6db090637fda42fb8cfa22e6c233f5e02f9
1ae20

Original issue reported on code.google.com by al...@bk.ru on 22 Aug 2014 at 12:01

GoogleCodeExporter commented 8 years ago

This feature must be added to CmdlnTexts and not to the compiler. In addition, 
it should be added to Texts. Thereby ALL clients benefit from this enhancement. 
The procedure Texts.Read would then look like:

PROCEDURE Read* (VAR R: Reader; VAR ch: CHAR);
  VAR u: Run; pos: LONGINT; nextch: CHAR;
BEGIN u := R.run; R.fnt := u.fnt; R.col := u.col; R.voff := u.voff; INC(R.off);
  IF u IS Piece THEN Files.Read(R.rider, ch); R.elem := NIL;
    IF (ch = 0AX) & u(Piece).ascii THEN ch := CR (* << LF to CR *)
    ELSIF (ch = CR) & u(Piece).ascii THEN (* << CR LF to CR *)
      pos := Files.Pos(R.rider); Files.Read(R.rider, nextch);
      IF nextch # 0AX THEN Files.Set(R.rider, u.file, pos) END
    END
...
not tested.
The lookahead uses Files operations, which seems expensive, but they are very 
fast.

- Josef

Original comment by josef.te...@gmail.com on 22 Aug 2014 at 7:52

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 8 years ago

Nice code! I tested it for different types of texts: 0DX, 0AX, 0DX+0AX. All 
works perfectly. With one correction:

      IF nextch # 0AX THEN Files.Set(R.rider, u(Piece).file, pos) END

instead of:

      IF nextch # 0AX THEN Files.Set(R.rider, u.file, pos) END

I think this solution excellent, and I am ready to agree with it, except for 
one thing. Notice the calculation of the position error (without option -l). 
This is global position of byte character in text. Now pos calculates 
incorrectly for Windows texts. It's the question why this position (without 
line) needs in Windows, where I don't know of any tool that would open a text 
file and place the cursor in this position. But I'm interested to hear your 
opinion (my code, where Texts.Read returns 0DX and 0AX both, calculates this 
pos correctly in any case).

    PROCEDURE Get*(VAR ch: CHAR);   (* read next character from source text, 0X if eof *)
    BEGIN
        Texts.Read(inR, ch);
        IF useLineNo THEN 
            IF ch = 0DX THEN curpos := (curpos DIV 256 + 1) * 256
            ELSIF curpos MOD 256 # 255 THEN INC(curpos)
                (* at 255 means:  >= 255 *)
            END
        ELSE
            INC(curpos)  (* <-- must be increased for 0DX and for 0AX that two chars are incoming, and not one. position of a text file has changed to two bytes and not one. *)
        END ;
...

P.S. I want to apologize that I create difficulties. :)

Original comment by al...@bk.ru on 22 Aug 2014 at 11:45

GoogleCodeExporter commented 8 years ago

this version corrects 2 bugs in Texts.Read: 
1. missing type guard
2. missing increment of text position in case of CR LF mapping

PROCEDURE Read* (VAR R: Reader; VAR ch: CHAR);
  VAR u: Run; pos: LONGINT; nextch: CHAR;
BEGIN u := R.run; R.fnt := u.fnt; R.col := u.col; R.voff := u.voff; INC(R.off);
  IF u IS Piece THEN Files.Read(R.rider, ch); R.elem := NIL;
    IF (ch = 0AX) & u(Piece).ascii THEN ch := CR (* << LF to CR *)
    ELSIF (ch = CR) & u(Piece).ascii THEN (* << CR LF to CR *)
      pos := Files.Pos(R.rider); Files.Read(R.rider, nextch);
      IF nextch = 0AX THEN INC(R.off) ELSE Files.Set(R.rider, u(Piece).file, pos) END
    END
...
not tested

I am still convinced that the CR LF mapping should not be done in OPM.Get.
For a precise curpos the following change could be applied.
In principle, 'curpos = Texts.Pos(inR)' would also work for the ELSE branch.
I separated the branches in order to show that calling Texts.Pos(inR) is only 
required
when detecting a CR. Regarding the speed, it trades an IF for a function call,
so it will be slightly faster.

PROCEDURE Get*(VAR ch: CHAR);   (* read next character from source text, 0X if 
eof *)
BEGIN
  Texts.Read(inR, ch);
  IF useLineNo THEN 
    IF ch = 0DX THEN curpos := (curpos DIV 256 + 1) * 256
    ELSIF curpos MOD 256 # 255 THEN INC(curpos)
                     (* at 255 means:  >= 255 *)
    END
  ELSIF ch = 0DX THEN
    curpos = Texts.Pos(inR) (* supports CR LF mapping *)
  ELSE
    INC(curpos)
  END ;
...
not tested

- Josef

Original comment by josef.te...@gmail.com on 24 Aug 2014 at 7:25

GoogleCodeExporter commented 8 years ago

Now I totally agreed with the code. You found a good method to calculate pos 
inside module OPM without changing interface of module Texts. That I was not be 
able to do (without good knowledge of details of low-level modules). I tested 
the code:

0DX+0AX source (ended by 0DX+0AX+0DX+0AX): 

Test.Mod  translating Test123
nextch=0AX
nextch=0AX
nextch=0AX
    pos    39  err 4   <-- pos is correct
nextch=0AX

0AX source (ended by 0AX+0AX):

TestA.Mod  translating Test123
    pos    36  err 4   <-- pos is correct

0DX source (ended by 0DX+0DX):

TestD.Mod  translating Test123
nextch=42X
nextch=0DX
nextch=45X
    pos    36  err 4   <-- pos is correct
nextch=0DX   <-- special case: 0DX is the last character of this source.
                 but works correctly: the next char after the last char is the same.

The code works perfectly, I'm sure you can commit it (or maybe more additional 
testing is necessary?)

https://github.com/Oleg-N-Cher/Ofront/commit/d35f27e6d64a097db53e7f3e6f68039c97a
dc2bf

Original comment by al...@bk.ru on 29 Aug 2014 at 10:35

z505 / ofront

Support source texts format not only with 0DX (Mac, Oberon System) and 0AX (UNIX), but with 0DX+0AX (Windows) at end of line. #3