kevinlawler / kona

Open-source implementation of the K programming language
ISC License
1.36k stars 138 forks source link

_db _bd #629

Closed tavmem closed 2 years ago

tavmem commented 2 years ago

I developed a fix (at least partial) for issue #615. However the fix causes this test to fail ...

  a:(1;1.0;"c";`d;1 2;3.0 4.0;"ef";`g`h;();(1;`z)); &/{x~_db _bd x}'a,,a
nonce error

so, it appears that changes are necessary to both _bd and _db. This seems to be the next step in fixing issue #615.

Note that neither _bd nor _db exist in the K2.0 reference manual. _bd could mean "binary to display", and _db could mean "display to binary".

There a some interesting behaviors in k2.8 with _bd:

  1. Both of these character vectors have sizes that are multiples of 4

    #_bd ("ab";"cd")
    48
    #_bd ("ab";"cd";"ef")
    64

    But this character vector does not

    #_bd "ab"
    19
    
    5 4#(_bd "ab"),"-"
    ("\001\000\000\000"
    "\000\000\000\000"
    "\375\377\377\377"
    "\002\000\000\000"
    "ab\000-")
  2. Executing the same command twice in K2.8 does not result in the same character vector:

    12 4#_bd ("ab";"cd")
    ("\001\000\000\000"
    "\000\000\000\000"
    "\000\000\000\000"
    "\002\000\000\000"
    "\375\377\377\377"
    "\002\000\000\000"
    "ab\0000"
    "0\\00"
    "\375\377\377\377"
    "\002\000\000\000"
    "cd\0007"
    "7\\37")
    
    12 4#_bd ("ab";"cd")
    ("\001\000\000\000"
    "\000\000\000\000"
    "\000\000\000\000"
    "\002\000\000\000"
    "\375\377\377\377"
    "\002\000\000\000"
    "ab\000\367"
    "j\000\000\000"
    "\375\377\377\377"
    "\002\000\000\000"
    "cd\000\000"
    "q\000\000\000")

This seems to indicate that k2.8 does not initialize the memory holding the character vector before writing the _bd version, and that only the necessary parts of the vector are read when converting back using _db. Similarly, in the 19 character case, there appears to be no need for the extra padding to make the size a multiple of 4, since there is no relevant information that follows.

  1. Note that the 4 element string "\375\377\377\377" is coding for -3 as a double word (32-bit) number in little endian format, where each element is a byte displayed in octal. If that element corresponds to a printable character, it is displayed as such. Converting to decimal, that sequence would be "\253\255\255\255"

  2. In the current version of Kona, this is not a multiple of 8 (which seems incorrect):

    #_bd ("ab";"cd")
    70
  3. Also, in the current version of Kona (which is 64 bit, as opposed to K2.8 with is 32 bit) the full display is:

    9 8# (_bd ("ab";"cd")), 2#"-"
    ("\001\000\000\000\000\000\000\000"
    "6\000\000\000\000\000\000\000"
    "\000\000\000\000\000\000\000\000"
    "\002\000\000\000\000\000\000\000"
    "\375\377\377\377\377\377\377\377"
    "\002\000\000\000\000\000\000\000"
    "ab\000\375\377\377\377\377"
    "\377\377\377\002\000\000\000\000"
    "\000\000\000cd\000--")

    In K2.8 the "cd" begins at character 40 and in Kona the "cd" probably should begin at character 80 (instead of character 67). Also, note that character 8 is "6", which corresponds to character 4 in k2.8, which is "\000" Kona appears to be coding some meaningful information there, which is absent in K2.8.

tavmem commented 2 years ago

In the partially fixed version of Kona (not yet commited), this is now a multiple of 8

  #_bd ("ab";"cd")
80

full display:

  10 8#_bd ("ab";"cd")
("\001\000\000\000\000\000\000\000"
 "@\000\000\000\000\000\000\000"
 "\000\000\000\000\000\000\000\000"
 "\002\000\000\000\000\000\000\000"
 "\375\377\377\377\377\377\377\377"
 "\002\000\000\000\000\000\000\000"
 "ab\000\000\000\000\000\000"
 "\375\377\377\377\377\377\377\377"
 "\002\000\000\000\000\000\000\000"
 "cd\000\000\000\000\000\000")

You can get a similar display using the linux utility "od"

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "ff" 0: _bd ("ab";"cd")
  \\

$ od -c ff
0000000 001  \0  \0  \0  \0  \0  \0  \0   @  \0  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0  \0  \0  \0  \0
0000040 375 377 377 377 377 377 377 377 002  \0  \0  \0  \0  \0  \0  \0
0000060   a   b  \0  \0  \0  \0  \0  \0 375 377 377 377 377 377 377 377
0000100 002  \0  \0  \0  \0  \0  \0  \0   c   d  \0  \0  \0  \0  \0  \0
0000120
$

Note that character 8 is now "@" This may be related to the size of the character vector, and which lane of memory is used. I will research this further.

So far, we still get the error (as _db has not yet been "fixed").

  _db _bd ("ab";"cd")
nonce error
tavmem commented 2 years ago

Made changes to _bd and _db so that

  _db _bd ("ab";"cd")
("ab"
 "cd")

However, I'm not committing this yet because, in the more general case, still get

  a:(1;1.0;"c";`d;1 2;3.0 4.0;"ef";`g`h;();(1;`z)); &/{x~_db _bd x}'a,,a
nonce error

A simpler case that fails:

  _db _bd (`a`b;`c`d)
nonce error

and, also

  "file" 1: (`a`b;`c`d)
  :data: 1: `"file"
nonce error
tavmem commented 2 years ago

Some progress ... got these to work:

  _db _bd (`a`b;`c`d)
(`a `b
 `c `d)

  "file" 1: (`a`b;`c`d);  1: "file"
(`a `b
 `c `d)

  a:(1;1.0;"c";`d;1 2;3.0 4.0;"ef";`g`h;();(1;`z)); &/{x~_db _bd x}'a,,a
1

However, not committing yet, since this fails now:

  a:(!11)#\:`a`b; &/{x~_db _bd x}'a
nonce error
a:(!11)#\:`a`b; &/{x~_db _bd x}'a
                    ^
tavmem commented 2 years ago

Interesting:

  a:(!5)#\:`a`b; &/{x~_db _bd x}'a
1
  a:(!6)#\:`a`b; &/{x~_db _bd x}'a
nonce error
a:(!6)#\:`a`b; &/{x~_db _bd x}'a
                   ^