kevinlawler / kona

Open-source implementation of the K programming language
ISC License
1.36k stars 138 forks source link

Crash when trying to serialize data. #615

Closed gitonthescene closed 1 year ago

gitonthescene commented 2 years ago

If I rename scrabble-puz.txt to scrabble-puz.k, I can load it as a script. If I try to serialize the data with 1:, it seems to work just fine, but crashes when I attempt to deserialize it.

kona      \ for help. \\ to exit.

  \l scrabble-puz.k
  #puz
8473
  `"scrabblepuz" 1: puz
  puz2: 1: `"scrabblepuz"
[1]    92797 segmentation fault  ./k

But the behavior seems inconsistent. During another run it seemed to make it most of the way through but the data appeared corrupted (floats where there should be ints) and it eventually crashed. Here's a script of the session: crash.script.txt

Here's just one of the lines seemingly corrupted:

 ("abcelpy"
  8.497929e-322 4.150151e-322 3.705492e-322 6.47226e-322 7.756831e-322 5.03947e-322 3.903119e-322)

Where this is what that line looks like in scrabble-puz.k:

   ("abcelpy"
  172 84 75 131 157 102 79)
tavmem commented 2 years ago

Stranger and stranger (in Linux) ... As we know, this segfaults

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  data: ("abcdeht"; 176 89 106 111 184 125 143)
  "file" 1: data

  data: 1: `"file"
  "file" 1: data
  data: 1: `"file"

Segmentation fault (core dumped)

But, this corrupts

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("abcdeht"; 176 89 106 111 184 125 143)
  :data: 1: `"file"
("abcdeht"
 176 89 106 111 184 125 143)

  "file" 1: data     /file is corrupt after this step.  Corruption occur here?  Or at the prior step?  
  :data: 1: `"file"
(();())

Actually, this is just another cnsecutive load-save-reload instance. I would have expected a segfault.

tavmem commented 2 years ago

This was an experiment to see if it was (again) the consecutive nature of the commands with a specific file that appeared to be key

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "fileA" 1: ("abcdeht"; 176 89 106 111 184 125 143)
  :data: 1: `"fileA"
("abcdeht"
 176 89 106 111 184 125 143)

  "fileB" 1: data
  :data: 1: `"fileB"
("abcdeht"
 176 89 106 111 184 125 143)

  "fileC" 1: data
  :data: 1: `"fileC"
("abcdeht"
 176 89 106 111 184 125 143)

  "fileB" 1: data
  :data: 1: `"fileB"

Segmentation fault (core dumped)

No So, what does it tell us? Since we were able to repetitively save-load to alternating files, the only difference is the initial save to fileA.

tavmem commented 2 years ago

However ... that is not the problem

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "fileA" 1: ("abcdeht"; 176 89 106 111 184 125 143)
  \\

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  data: ("abcdeht"; 176 89 106 111 184 125 143)
  "fileB" 1: data
  \\

$ cmp fileA.K fileB.K
$ 
tavmem commented 2 years ago

Stranger still (in Linux). This is what the file looks like after the first serialization:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 1 2)
  \\

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000  ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0100 0000 0000 0000 0200 0000 0000 0000  ................
$ 

However, after the second serialization, not only is the file corrupt, the "data" in the workspace is corrupt!

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 1 2)
  :data: 1: `"file"
("a"
 1 2)
  "file" 1: data
  data
("a";())
  \\

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000  ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ 
tavmem commented 2 years ago

Using https://github.com/tavmem/ks to get some idea of what happened, I executed the 3 commands

"file" 1: ("a"; 1 2)
:data: 1: `"file"
"file" 1: data

getting (in part)

~AM vf_ex(*p,g)      K vf_ex(V q, K g) <- K dv_ex(K a, V *p, K b)      BEG vf_ex
   q:         0x40
   sd(g):     0x7f6d78c44900 0x7f6d78c44918            1-6 0 2   
("file"
 ("a"
  1 2))
  BEG ci   
  BEG ci     END ci  0x7ffda472a688     0x7f6d78c447c0 0x7f6d78c447d8            3-6 -3 4   "file"
  BEG ci   
  BEG ci     END ci  0x7ffda472a658     0x7f6d78c446c0 0x7f6d78c446d8            3-6 3 1   "a"
  BEG ci     END ci  0x7ffda472a658     0x7f6d78c0d048 0x7f6d78c0d060            3-6 -1 2   1 2
             END ci  0x7ffda472a688     0x7f6d78c44880 0x7f6d78c44898            3-6 0 2   
("a"
 1 2)
             END ci  0x7ffda472a6b8     0x7f6d78c44900 0x7f6d78c44918            2-6 0 2   
("file"
 ("a"
  1 2))
   vf_ex  ELSE --- z=((K(*)(K,K))DT[(L)q].func)(a,b);    (L)q:64---beg _ld
beg _ld_write
_1d_write  mmap(addfress:0,   length:112,   PROT_WRITE,MAP_SHARED,   file:3   offset:0)
beg wrep
beg wrep
beg wrep
  BEG ci     END ci  0x7ffda472a6b8     0x7f6d78c44040 0x7f6d78c44058            8-6 6 0   
      cd                 0x7f6d78c44900 0x7f6d78c44918            2-6 0 2   
("file"
 ("a";()))

vf_ex is called (and the data is OK), which calls _ld (function 64 in Dispatch Table in src/k.c) which calls _ld_write which calls mmap then, something calls wrep 3 times, at the end of which, the data is corrupt

Next step: Investigate the mmap call and the 3 wrep calls

tavmem commented 2 years ago

So .. what did we find? Making these changes (note the mid statement is commented outj):

$ git diff
diff --git a/src/0.c b/src/0.c
index 6ba3663..e0cd703 100644
--- a/src/0.c
+++ b/src/0.c
@@ -575,11 +575,13 @@ Z K _1d_write(K x,K y,I dosync) {
   U(e)

   //Largely copy-pasted from 6:dyadic
+  O("bef open   sd(y):   ");sd(y);
   I f=open(e,O_RDWR|O_CREAT|O_TRUNC,07777);
   free(e);
   P(f<0,SE)

+  //O("mid   sd(y):   ");sd(y);
   P(ftruncate(f,n),SE)
+  O("aft ftruncate   sd(y);   ");sd(y);
   //lfop: see 0: write for possible way to do ftruncate etc. on Windows
   S v;
   if(MAP_FAILED==(v=mmap(0,n,PROT_WRITE,MAP_SHARED,f,0)))R SE; // should this be MAP_PRIVATE|MAP_NORESERVE ?
diff --git a/src/0.h b/src/0.h
index acaf688..2dc3b73 100644
--- a/src/0.h
+++ b/src/0.h
@@ -23,6 +23,8 @@ extern V vd[];
 extern V vm[];
 extern I fbr;
 extern I fll;
+extern K sd(K x);
+extern K sd_(K x,I f);
 I sva(V p);
 I rep(K x,I y);
 I wrep(K x,V v,I y);
$ 

we get

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 1 2)
bef open   sd(y):        0x7fb3b9dc6680 0x7fb3b9dc6698            2-6 0 2   
("a"
 1 2)
aft ftruncate   sd(y);        0x7fb3b9dc6680 0x7fb3b9dc6698            2-6 0 2   
("a"
 1 2)
  data: 1: `"file"
  "file" 1: data
bef open   sd(y):        0x7fb3b9dc66c0 0x7fb3b9dc66d8            3-6 0 2   
("a"
 1 2)
aft ftruncate   sd(y);        0x7fb3b9dc66c0 0x7fb3b9dc66d8            3-6 0 2   
("a";())

The corruption occurs between the open statement and the ftruncate statement. However, trying to check the status of y in between these statements (e.g., uncommenting the mid statement, yields

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 1 2)
bef open   sd(y):        0x7f76184e7680 0x7f76184e7698            2-6 0 2   
("a"
 1 2)
mid   sd(y):        0x7f76184e7680 0x7f76184e7698            2-6 0 2   
("a"
 1 2)
aft ftruncate   sd(y);        0x7f76184e7680 0x7f76184e7698            2-6 0 2   
("a"
 1 2)
  data: 1: `"file"
  "file" 1: data
bef open   sd(y):        0x7f76184e76c0 0x7f76184e76d8            3-6 0 2   
("a"
 1 2)
mid   sd(y):        0x7f76184e76c0 0x7f76184e76d8            3-6 0 2   

Bus error (core dumped)

''A Bus error is trying to access memory that can't possibly be there. You've used an address that's meaningless to the system, or the wrong kind of address for that operation."

It's not yet clear why that would be the case between the open and the ftruncate statements.

tavmem commented 2 years ago

To simplify further ... we get rid of the variable named data

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)

  1: `"file"
("a"
 4 5)

  "file" 1: ("a"; 4 5)

After each of the 3 commands, we examine file.K using a different Linux terminal session:

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000  ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000  ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000  ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000  ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ 

After the 2nd execution of the first command, file.K is corrupt.

tavmem commented 2 years ago

At first, it looks like the dyadic function f 1: x is the cause of the problem. However, consider executing it 3 times with different x

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)
  "file" 1: ("b"; 6 7)
  "file" 1: ("c"; 8 9)

and (again) checking the file after each execution

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000  ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000  ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6200 0000 0000 0000  ........b.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0600 0000 0000 0000 0700 0000 0000 0000  ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: 0300 0000 0000 0000 6300 0000 0000 0000  ........c.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000  ................
00000060: 0800 0000 0000 0000 0900 0000 0000 0000  ................
$ 

The file does not become corrupt ... It now appears that the problem with the dyadic function f 1: x is caused by prior execution of the monadic function 1: f, although the contents did get loaded properly by the monadic function.

tavmem commented 2 years ago

Some further analysis. I made some code changes to document what's happening. The results are:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)                     //command 1
_1d     _1d_write
** bef open   sd(y):        0x7f7e1665c680 0x7f7e1665c698            2-6 0 2   
("a"
 4 5)
** aft open   sd(y):        0x7f7e1665c680 0x7f7e1665c698            2-6 0 2   
("a"
 4 5)
mmap     close     munmap

   :data: 1: `"file"                       //command 2
_1m     open     mmap     _1m_r     _1m_r     _2m_r     rrep     _1m_r     mmap     close     munmap
("a"
 4 5)

  "file" 1: data                           //command 3
_1d     _1d_write
** bef open   sd(y):        0x7f7e1665c880 0x7f7e1665c898            3-6 0 2   
("a"
 4 5)
** aft open   sd(y):        0x7f7e1665c880 0x7f7e1665c898            3-6 0 2   

Bus error (core dumped)
$ 

So ... what does it show?

Note:

Summary: mmap does not appear to be working properly in Linux, MacOS or Windows. On the face of it, mmap appears to be an inappropriate strategy to implemnt 1:

However, it does seem a bit strange that:

tavmem commented 2 years ago

Further confirmation that the call mmap twice, call munmap once in command 2 may be related to the problem. It works for 2 scalars:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4)                                 //command 1
_1d     _1d_write
** bef open   sd(y):        0x7f76ca0a1680 0x7f76ca0a1698            2-6 0 2   
("a";4)
** aft open   sd(y):        0x7f76ca0a1680 0x7f76ca0a1698            2-6 0 2   
("a";4)
mmap     close     munmap

   :data: 1: `"file"                                 //command 2
_1m     open     mmap     _1m_r     _1m_r     _2m_r     rrep     _1m_r     _2m_r     rrep     close     munmap
("a";4)

  "file" 1: data                                     //command 3
_1d     _1d_write
** bef open   sd(y):        0x7f76ca0a1880 0x7f76ca0a1898            3-6 0 2   
("a";4)
** aft open   sd(y):        0x7f76ca0a1880 0x7f76ca0a1898            3-6 0 2   
("a";4)
mmap     close     munmap

Note that in command 2, both mmap and munmap are called only once. If we try 2 vectors ("ab";4 5) , command 2 calls mmap 3 times and calls munmap only once. It also fails.

tavmem commented 2 years ago

Here is a "plausible" theory on the cause of the problem. Consider the case ("a"; 4 5) where mmap is called twice, and munmap is called once. The first time mmap is called in _lm, it creates map v for length s with address 0

if(MAP_FAILED==(v=mmap(0,s,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,0)))R SE;

The second time mmap is called in _lm_r it creates map u for length length with address 0

if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;}

However, when munmap is called in _lm it is only for map v with length s with address 0

r=munmap(v,s);

Map u is never unmapped!

Note that this gets complex very quickly. In the case of ("ab"; 4 5) the current code calls mmap 3 times, and calls munmap only once for the first mapping.

Remember that the initial impetus for this issue was the Scrabble-Puzzle It has hundreds of members and each member is of the form ("abcelpy"; 172 84 75 131 157 102 79) Scrabble-puzzle would appear to call mmap thousands of times, and munmap only once for the very first map.

bakul commented 2 years ago

I used strace(1) to see what syscalls are used for this case. I just did \l scrabble-puz and quit.

My conclusion is that the whole memory allocation scheme should be reviewed. In general minimize calls to malloc/mmap by using better defaults and guesstimates.

tavmem commented 2 years ago

Thanks! I agree with your conclusion.

tavmem commented 2 years ago

These 4 simple code changes (as a test), do the following:

@@ -539,11 +541,11 @@ Z K _1m_r(I f,V fixed, V v,V aft,Ib) { //File descriptor, moving into mmap, length+=mod; offset-=mod;

Segmentation fault (core dumped) $

Note that the Linux manual page for mmap(2) states

munmap() The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references.


The observed behavior in our test is consistent with the description in the Linux manual page.
It's not clear how k2.8 and k3.2 got around this.
tavmem commented 2 years ago

Using strace on k2.8, we find some differences. k2.8:

bakul commented 2 years ago

IMHO this still doesn't get at the real problem. See my next message.

bakul commented 2 years ago

mmap/munmap behave as expected. The problem may be in how they are used in kona. k2.8/3.2 don't "get around" their semantics. There is certainly memory leak of some sort. That is easy to see if you do this:

Another indication that allocation is not quite right. This may be different from the slowness issue which seems to be triggered by doing far too many small mmaps.

tavmem commented 2 years ago

It appears that we may have found the culprit. Make the following single line change to print the args for function strdupn Note: there was already a warning in the comment that strdupn can overallocate

$ git diff
diff --git a/src/ks.c b/src/ks.c
index 9810042..8e5f42d 100644
--- a/src/ks.c
+++ b/src/ks.c
@@ -8,7 +8,7 @@
 Z I ns=0,sdd=0;
 // Z S sdup(S s){R strdupn(s,strlen(s));} //using this because "strdup" uses [used] dynamically linked malloc which fails with our static free
 Z S sdupI(S s){I k;S d=alloc(NSLOTS*sizeof(I)+(k=strlen(s))+1);if(!d)R 0;ns++;sdd=1;d+=NSLOTS*sizeof(I);d[k]=0;R memcpy(d,s,k);}
-S strdupn (S s,I k) {S d=alloc(k+1);if(!d)R 0;d[k]=0;R memcpy(d,s,k);} // mm/o  (note: this can overallocate)
+S strdupn (S s,I k) {O("s:%s   k:%lld\n",s,k); S d=alloc(k+1);if(!d)R 0;d[k]=0;R memcpy(d,s,k);} // mm/o  (note: this can overallocate)
 //I SC0N(S a,S b,I n) {I x=memcmp(a,b,n); R x<0?-1:x>0?1:a[n]?1:0; }// non-standard way to compare aaa\0 vs aaa
 I strlenn(S s,I k){S t=memchr(s,'\0',k); R t?t-s:k;}

If we run valgrind on this simple file

$ cat sp1.k
puz:(("abcdeht"
  176 79 106 111 184 125 143))
$ 

We get 17 executions of strdupn and 17 bytes definitely lost

kona      \ for help. \\ to exit.

  \l sp1.k
s:\l sp1.k
   k:9
s:k   k:1
s:puz:(("abcdeht"
   k:16
s:  176 79 106 111 184 125 143))
   k:31
s:k   k:1
s:puz:(("abcdeht"
  176 79 106 111 184 125 143))   k:3
s:puz   k:3
s:puz   k:3
s:176 79 106 111 184 125 143   k:4
s:79 106 111 184 125 143   k:3
s:106 111 184 125 143   k:4
s:111 184 125 143   k:4
s:184 125 143   k:4
s:125 143   k:4
s:143   k:3
  \\
s:\\
   k:3
s:k   k:1
==54559== LEAK SUMMARY:
==54559==    definitely lost: 17 bytes in 1 blocks
==54559==    indirectly lost: 0 bytes in 0 blocks
==54559==      possibly lost: 184 bytes in 9 blocks
==54559==    still reachable: 9 bytes in 3 blocks
==54559==         suppressed: 0 bytes in 0 blocks

If we use an even simpler file

 cat sp1a.k
puz:(("abc"
  4 5 6))
$ 

We get 13 executions of strdupn, and 13 bytes definitely lost

kona      \ for help. \\ to exit.

  \l sp1a.k
s:\l sp1a.k
   k:10
s:k   k:1
s:puz:(("abc"
   k:12
s:  4 5 6))
   k:10
s:k   k:1
s:puz:(("abc"
  4 5 6))   k:3
s:puz   k:3
s:puz   k:3
s:4 5 6   k:2
s:5 6   k:2
s:6   k:1
  \\
s:\\
   k:3
s:k   k:1
==54679== LEAK SUMMARY:
==54679==    definitely lost: 13 bytes in 1 blocks
==54679==    indirectly lost: 0 bytes in 0 blocks
==54679==      possibly lost: 184 bytes in 9 blocks
==54679==    still reachable: 9 bytes in 3 blocks
==54679==         suppressed: 0 bytes in 0 blocks
bakul commented 2 years ago

Try this simple test: file1.k:

foo:(("abc";))

Run k under valgrind. Just load file1 and exit and check for leaks.

file2.k:

foo:(("abc"
))

Run k under valgrind. Just load file2 and exit. and check for leaks. They create identical objects but in the second case there is a leak!

I suspect this is a separate bug, not related to mmap but should be fixed.

tavmem commented 2 years ago

Thanks ! Your examples provide further evidence that strdupn may be the culprit.

When loading file1.k under valgrind, strdupn is not called at all. When loading file2.k under valgrind, strdupn is called 8 times.

I agree with your suspicion that this bug is probably a separate issue, and probably not related to the mmap problem.

tavmem commented 2 years ago

Revising the last comment: I was looking at "possibly lost" and "definitely lost" for file1.k. strdupn is never associated with any bytes "possibly lost" or "definietly lost " for file1.k.

However, strdupn is called:

tavmem commented 2 years ago

I'm putting the memory leak when using \l file into its own issue. The \l file command does not appear to invoke mmap, and is probably not related to mmap. The focus of the mmap problem should be the commands "file" 1: ("a"; 4 5) and :data: 1: `"file", i.e., 1: dayad and 1: monad

bakul commented 2 years ago

FWIW, this doesn't crash any more on OS X (m1 and x86-64), FreeBSD & Linux. It still does far too many mmap calls, which should probably tracked under a separate issue.

tavmem commented 2 years ago

It's not a crash that we are attempting to fix at this point ... rather, it's memory corruption ...

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)           //command 1
  :data: 1: `"file"              //command 2
("a"
 4 5)
  data                           //display
("a"
 4 5)
  "file" 1: data                 //command 3
  data                           //display
("a";())

I'll try to check (later today) if this corruption still occurs on OS X, and on Windows.

tavmem commented 2 years ago

No data corruption (in this case) on OS X

$ ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)
  :data: 1: "file"
("a"
 4 5)
  "file" 1: data
  data
("a"
 4 5)

Got a surprise in Windows:

tavme@DESKTOP-FVKENU9 MINGW64 ~/kona
$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)
  :data: 1: "file"
("a"
 4 5)
  "file" 1: data
Invalid argument error
"file" 1: data
       ^
>  \
  data = ("a"; 4 5)
(1
 1 1)

  "file" 1: ("a"; 4 5)
Invalid argument error
"file" 1: ("a"; 4 5)
       ^
>

This problem in Windows is sufficiently different that I will open a new issue.

tavmem commented 2 years ago

Progress in Linux.

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)
  data: 1: `"file"
  "file" 1: data
  data
("a"
 4 5)

  data: 1: `"file"
  "file" 1: data
  data
("a"
 4 5)

Making the following changes (for the test case) eliminates both:

@@ -531,7 +530,6 @@ Z K _1m_r(I f,V fixed, V v,V aft,Ib) { //File descriptor, moving into mmap, K z,x; if(0==t||5==t){z=newK(t,n); DO(n,x=_1m_r(f,fixed,v+r,aft,&r); if(!x){cd(z);R 0;} kK(z)[i]=x; ) } else { //map lists to file. atoms are allocated not mapped

(END)

Next step:
The above only works for the test case.
It needs to be generalized by using ```mod``` again in
tavmem commented 2 years ago

Before we "generelize" the process of reading a file that has been serialized, we need to fix an additional problem in the creation of the serialized file.

In k2.8, the command

"file" 1: ("ab"; 4 5)

creates the file (in 32-bit representation):

$ xxd file.l
00000000: fdff ffff 0100 0000 0000 0000 0200 0000  ................
00000010: fdff ffff 0100 0000 fdff ffff 0200 0000  ................
00000020: 6162 0000 0000 0000 fdff ffff 0100 0000  ab..............
00000030: ffff ffff 0200 0000 0400 0000 0500 0000  ................

The 32-bit binary representation can be translated to

-3  1  0  2
-3  1 -3  2
ab  0 -3  1
-1  2  4  5

In kona, the same command creates the file (in 64-bit representation):

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000040: 6162 00fd ffff ffff ffff ff01 0000 0000  ab..............
00000050: 0000 00ff ffff ffff ffff ff02 0000 0000  ................
00000060: 0000 0004 0000 0000 0000 0005 0000 0000  ................
00000070: 0000 00                                  ...

The 64-bit binary representation should be translatable to the same result, but it's not. We get the start correctly ...

-3 1
 0 2
-3 1
-3 2

but then there is a problem in line 00000040
To get ab 0 line 00000040 should be

00000040: 6162 0000 0000 0000 0000 0000 0000 0000  ab..............`

In kona, the file creation process does not include correct padding at the end of a character array. The k2.8 (32-bit) file is 64 bytes. The kona (64-bit) file should be 128 bytes. It is only 115 bytes.

tavmem commented 2 years ago

However, the "padding" at the end of a character array in k2.8 seems inconsistent: Consider the command

"file" 1: ("ab")

k2.8:

$ xxd file.l
00000000: fdff ffff 0100 0000 fdff ffff 0200 0000  ................
00000010: 6162 00                                  ab.

kona:

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000020: 6162 00                                  ab.

No "padding" in either case. But ... the k2.8 file is 19 bytes. The kona file is only 35 bytes. I would have expected 38 bytes. 35 may be OK in this case, as the start of each element is aligned on a double word (k2.8) or a quad word (kona) boundary.

This suggests that a character array is not automatically "padded". "Padding" is added at the beginning of the next element (if there is one) for proper alignment.

tavmem commented 2 years ago

k2.8 seems even more inconsistent. Consider

"file" 1: ("ab";"cd")

k2.8

$ xxd file.l
00000000: fdff ffff 0100 0000 0000 0000 0200 0000  ................
00000010: fdff ffff 0100 0000 fdff ffff 0200 0000  ................
00000020: 6162 0000 0000 0000 fdff ffff 0100 0000  ab..............
00000030: fdff ffff 0200 0000 6364 0000 0000 0000  ........cd......

kona:

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000  ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000030: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000040: 6162 00fd ffff ffff ffff ff01 0000 0000  ab..............
00000050: 0000 00fd ffff ffff ffff ff02 0000 0000  ................
00000060: 0000 0063 6400                           ...cd.

k2.8: Full (even extra double word) "padding" at the end of the first and the final character array. kona: Minimal "padding" for both. Quad word misalignment for all elements that follow the first character array.

It might be better for kona to always add "padding" at the end of a character array.

More importantly, kona uses "mmap" to write ("ab","cd") to "file". Since in k2.8 we have "6162 0000 0000 0000" then either

tavmem commented 2 years ago

Using the last commit of kona to github (May 3, 2022) below is a comparison of file.K (created by kona) and file.l (created by k2.8)

"file" 1: (1;1.0;"c";`d;1 2;3.0 4.0;"ef";`g`h;();(1;`z))
$ xxd file.K                                       xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  00000000: fdff ffff 0100 0000
00000010: 0000 0000 0000 0000 0b00 0000 0000 0000            0000 0000 0b00 0000
00000020: fdff ffff ffff ffff 0100 0000 0000 0000  00000010: fdff ffff 0100 0000
00000030: 0100 0000 0000 0000 0100 0000 0000 0000            0100 0000 0100 0000                      1
00000040: fdff ffff ffff ffff 0100 0000 0000 0000  00000020: fdff ffff 0100 0000
00000050: 0200 0000 0000 0000 0000 0000 0000 f03f            0200 0000 0100 0000
                                                   00000030: 0000 0000 0000 f03f                      1.0
00000060: fdff ffff ffff ffff 0100 0000 0000 0000            fdff ffff 0100 0000
00000070: 0300 0000 0000 0000 6300 0000 0000 0000  00000040: 0300 0000 6300 0000                      “c”
00000080: fdff ffff ffff ffff 0100 0000 0000 0000            fdff ffff 0100 0000
00000090: 0400 0000 0000 0000 6400 0000 0000 0000  00000050: 0400 0000 6400 9ff7                      `d
000000a0: fdff ffff ffff ffff 0100 0000 0000 0000            fdff ffff 0100 0000
000000b0: ffff ffff ffff ffff 0200 0000 0000 0000  00000060: ffff ffff 0200 0000
000000c0: 0100 0000 0000 0000 0200 0000 0000 0000            0100 0000 0200 0000                      1 2
000000d0: fdff ffff ffff ffff 0100 0000 0000 0000  00000070: fdff ffff 0100 0000
000000e0: feff ffff ffff ffff 0200 0000 0000 0000            feff ffff 0200 0000
000000f0: 0000 0000 0000 0840 0000 0000 0000 1040  00000080: 0000 0000 0000 0840 0000 0000 0000 1040  3.0 4.0
00000100: fdff ffff ffff ffff 0100 0000 0000 0000  00000090: fdff ffff 0100 0000
00000110: fdff ffff ffff ffff 0200 0000 0000 0000            fdff ffff 0200 0000
00000120: 6566 00fd ffff ffff ffff ff01 0000 0000  000000a0: 6566 0000 0000 0000 fdff ffff 0100 0000  “ef”
00000130: 0000 00fc ffff ffff ffff ff02 0000 0000  000000b0: fcff ffff 0200 0000
00000140: 0000 0067 0068 00fd ffff ffff ffff ff01            6700 6800 0000 0000                      `g`h
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  000000c0: fdff ffff 0100 0000
00000160: 0000 0000 0000 00fd ffff ffff ffff ff01            0000 0000 0000 0000                      ()
00000170: 0000 0000 0000 0000 0000 0000 0000 0002  000000d0: fdff ffff 0100 0000 0000 0000 0200 0000
00000180: 0000 0000 0000 00fd ffff ffff ffff ff01  000000e0: fdff ffff 0100 0000
00000190: 0000 0000 0000 0001 0000 0000 0000 0001            0100 0000 0100 0000
000001a0: 0000 0000 0000 00fd ffff ffff ffff ff01  000000f0: fdff ffff 0100 0000                      (1;
000001b0: 0000 0000 0000 0004 0000 0000 0000 007a            0400 0000 7a00 9ef7                         `z)
000001c0: 0000 0000 0000 00

There are 2 apparent problems:

  1. The coding for 1.0 in k2.8 beginning on line 00000020 (in shortened form) is fdff 0100 0200 0100 f03f In kona on line 00000040 (in shortened form) it is only fdff 0100 0200 f03f
  2. More significantly, in kona, a new element beginning with either fdff or feff or ffff begins on a new line, until line 00000120. After that, it gets totally messed up.

In my opinion, in kona, the first may not need to be fixed, but the second must be fixed.

tavmem commented 2 years ago

After fixing issues #629 and #630, this issue now appears resolved in Linux. I haven't yet tried it in OSX, nor in Windows, so I'm keeping this issue open till I do.

In Linux:

  \l scrabble-puz.k
  #puz
8473
  `"scrabblepuz" 1: puz
  puz2: 1: `"scrabblepuz"
  #puz2
8473
  &/puz=puz2
(1 1 1 1 1 1 1
 1 1 1 1 1 1 1)
tavmem commented 2 years ago

It works in OSX.

tavmem commented 2 years ago

It fails in Windows:

MINGW64 ~/kona
$ ./k
kona      \ for help. \\ to exit.

  \l scrabble-puz.k
  #puz
8473
  `"scrabblepuz" 1: puz
  puz2: 1: `"scrabblepuz"
Segmentation fault

However, both #629 and #630 work in Windows.

tavmem commented 2 years ago

There is something else that is strange (problematic) in Windows

In Linux, the 1st command creates file.K and the 2nd command leaves file.K untouhed.

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a";4 5)
  \\

$ ls -l file.K
-rwxr-xr-t. 1 tom tom 112 Aug 25 23:12 file.K 
$ date
Thu Aug 25 11:13:01 PM EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  1: "file"
("a"
 4 5)
  \\

$ ls -l file.K
-rwxr-xr-t. 1 tom tom 112 Aug 25 23:12 file.K
$ 

In Windows, the 1st command creates file.l and the 2nd command rewrites file.l, (and with a different size).

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a";4 5)
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 112 Aug 25 23:17 file.l
$ date
Thu Aug 25 23:18:17 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.

  1: "file"
("a"
 4 5)
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 144 Aug 25 23:18 file.l
$

In OSX, (like Linux) the file is created by the first command, and not rewritten by the 2nd command:

$ ./k
kona      \ for help. \\ to exit.

  "file" 1: ("a"; 4 5)
  \\

$ ls -l file.l
-rwxr-xr-x  1 thomasszczesny  staff  112 Aug 26 15:43 file.l
$ date
Fri Aug 26 15:44:46 EDT 2022

$ ./k
kona      \ for help. \\ to exit.

  1: "file"
("a"
 4 5)
  \\

$ ls -l file.l
-rwxr-xr-x  1 thomasszczesny  staff  112 Aug 26 15:43 file.l
$ 
tavmem commented 2 years ago

I wanted to check that the problems in Windows for #615 were not the result of recent fixes for Linux and OSX over the last year. So, I reverted the HEAD to the state of the commit made on Oct 26, 2020. The problem exists back then.


$ git checkout 046e3a780cc3d43a109607a5730cad26c5ad3b2d
HEAD is now at 046e3a7 these 'fixes' no longer seem necessary

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: ("a";4 5)
\\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 112 Aug 27 22:38 file.l
$ date
Sat Aug 27 22:39:07 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"
("a"
 4 5)
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 144 Aug 27 22:39 file.l
``
tavmem commented 2 years ago

Interesting ... the problem does not exist for this simpler case

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: ("a")
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 22:55 file.l
$ date
Sat Aug 27 22:56:12 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"
"a"
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 22:55 file.l
tavmem commented 2 years ago

The Windows problem does not exist for this simple case:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: (4)
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 23:05 file.l
$ date
Sat Aug 27 23:06:08 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"
4
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 23:05 file.l
tavmem commented 2 years ago

The Windows problem does exist for this simple case:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: (4 5)
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 48 Aug 27 23:03 file.l
$ date
Sat Aug 27 23:04:09 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"
4 5
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 80 Aug 27 23:04 file.l
tavmem commented 2 years ago

And, the Windows problem does exist for this simple case:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: ("ab")
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 40 Aug 27 23:14 file.l
$ date
Sat Aug 27 23:15:01 EDT 2022

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"
"ab"
  \\

$ ls -l file.l
-rw-r--r-- 1 tavme tavme 72 Aug 27 23:15 file.l
tavmem commented 1 year ago

The problem (in Windows) begins with the "mmap" command. Making only the following addition:

--- a/src/0.c
+++ b/src/0.c
@@ -540,6 +540,7 @@ Z K _1m_r(I f,V fixed, V v,V aft,I*b) {   //File descriptor, moving * into mmap,
     offset-=mod;

     if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;}
+    exit(0);
     mMap+=length;
     mUsed+=length;if(mUsed>mMax)mMax=mUsed;

We get this result

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: ("ab")
  \\

$ xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000020: 6162 0000 0000 0000                      ab......

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"

$ xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000020: 6162 0000 0000 0000 0000 0000 0000 0000  ab..............
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000                      ........

In Linux (with the same modification) we get:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  "file" 1: ("ab")
  \\

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000020: 6162 0000 0000 0000                      ab......

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  1: "file"

$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000  ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000  ................
00000020: 6162 0000 0000 0000                      ab......
tavmem commented 1 year ago

Windows problem with mmap fixed in commit of Aug 28, 2022

gitonthescene commented 1 year ago

This is awesome Tom! Thanks very much.

tavmem commented 1 year ago

Thanks for identifying the issue. I wasn't confident that we were done here till I just tested it (on Linux).

  \l scrabble-puz.k
  #puz
8473
  "scrabblepuz" 1: puz
  puz2: 1: "scrabblepuz"
  #puz2
8473
  ^/^/^/puz=puz2
1.0

Tested it in OSX ... works. Tested in Windows ... works.