Closed gitonthescene closed 1 year ago
Stranger and stranger (in Linux) ... As we know, this segfaults
$ rlwrap -n ./k
kona \ for help. \\ to exit.
data: ("abcdeht"; 176 89 106 111 184 125 143)
"file" 1: data
data: 1: `"file"
"file" 1: data
data: 1: `"file"
Segmentation fault (core dumped)
But, this corrupts
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("abcdeht"; 176 89 106 111 184 125 143)
:data: 1: `"file"
("abcdeht"
176 89 106 111 184 125 143)
"file" 1: data /file is corrupt after this step. Corruption occur here? Or at the prior step?
:data: 1: `"file"
(();())
Actually, this is just another cnsecutive load-save-reload instance. I would have expected a segfault.
This was an experiment to see if it was (again) the consecutive nature of the commands with a specific file that appeared to be key
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"fileA" 1: ("abcdeht"; 176 89 106 111 184 125 143)
:data: 1: `"fileA"
("abcdeht"
176 89 106 111 184 125 143)
"fileB" 1: data
:data: 1: `"fileB"
("abcdeht"
176 89 106 111 184 125 143)
"fileC" 1: data
:data: 1: `"fileC"
("abcdeht"
176 89 106 111 184 125 143)
"fileB" 1: data
:data: 1: `"fileB"
Segmentation fault (core dumped)
No So, what does it tell us? Since we were able to repetitively save-load to alternating files, the only difference is the initial save to fileA.
However ... that is not the problem
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"fileA" 1: ("abcdeht"; 176 89 106 111 184 125 143)
\\
$ rlwrap -n ./k
kona \ for help. \\ to exit.
data: ("abcdeht"; 176 89 106 111 184 125 143)
"fileB" 1: data
\\
$ cmp fileA.K fileB.K
$
Stranger still (in Linux). This is what the file looks like after the first serialization:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 1 2)
\\
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000 ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0100 0000 0000 0000 0200 0000 0000 0000 ................
$
However, after the second serialization, not only is the file corrupt, the "data" in the workspace is corrupt!
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 1 2)
:data: 1: `"file"
("a"
1 2)
"file" 1: data
data
("a";())
\\
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000 ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
$
Using https://github.com/tavmem/ks
to get some idea of what happened,
I executed the 3 commands
"file" 1: ("a"; 1 2)
:data: 1: `"file"
"file" 1: data
getting (in part)
~AM vf_ex(*p,g) K vf_ex(V q, K g) <- K dv_ex(K a, V *p, K b) BEG vf_ex
q: 0x40
sd(g): 0x7f6d78c44900 0x7f6d78c44918 1-6 0 2
("file"
("a"
1 2))
BEG ci
BEG ci END ci 0x7ffda472a688 0x7f6d78c447c0 0x7f6d78c447d8 3-6 -3 4 "file"
BEG ci
BEG ci END ci 0x7ffda472a658 0x7f6d78c446c0 0x7f6d78c446d8 3-6 3 1 "a"
BEG ci END ci 0x7ffda472a658 0x7f6d78c0d048 0x7f6d78c0d060 3-6 -1 2 1 2
END ci 0x7ffda472a688 0x7f6d78c44880 0x7f6d78c44898 3-6 0 2
("a"
1 2)
END ci 0x7ffda472a6b8 0x7f6d78c44900 0x7f6d78c44918 2-6 0 2
("file"
("a"
1 2))
vf_ex ELSE --- z=((K(*)(K,K))DT[(L)q].func)(a,b); (L)q:64---beg _ld
beg _ld_write
_1d_write mmap(addfress:0, length:112, PROT_WRITE,MAP_SHARED, file:3 offset:0)
beg wrep
beg wrep
beg wrep
BEG ci END ci 0x7ffda472a6b8 0x7f6d78c44040 0x7f6d78c44058 8-6 6 0
cd 0x7f6d78c44900 0x7f6d78c44918 2-6 0 2
("file"
("a";()))
vf_ex
is called (and the data
is OK),
which calls _ld
(function 64 in Dispatch Table in src/k.c)
which calls _ld_write
which calls mmap
then, something calls wrep
3 times, at the end of which, the data
is corrupt
Next step: Investigate the mmap
call and the 3 wrep
calls
So .. what did we find? Making these changes (note the mid
statement is commented outj):
$ git diff
diff --git a/src/0.c b/src/0.c
index 6ba3663..e0cd703 100644
--- a/src/0.c
+++ b/src/0.c
@@ -575,11 +575,13 @@ Z K _1d_write(K x,K y,I dosync) {
U(e)
//Largely copy-pasted from 6:dyadic
+ O("bef open sd(y): ");sd(y);
I f=open(e,O_RDWR|O_CREAT|O_TRUNC,07777);
free(e);
P(f<0,SE)
+ //O("mid sd(y): ");sd(y);
P(ftruncate(f,n),SE)
+ O("aft ftruncate sd(y); ");sd(y);
//lfop: see 0: write for possible way to do ftruncate etc. on Windows
S v;
if(MAP_FAILED==(v=mmap(0,n,PROT_WRITE,MAP_SHARED,f,0)))R SE; // should this be MAP_PRIVATE|MAP_NORESERVE ?
diff --git a/src/0.h b/src/0.h
index acaf688..2dc3b73 100644
--- a/src/0.h
+++ b/src/0.h
@@ -23,6 +23,8 @@ extern V vd[];
extern V vm[];
extern I fbr;
extern I fll;
+extern K sd(K x);
+extern K sd_(K x,I f);
I sva(V p);
I rep(K x,I y);
I wrep(K x,V v,I y);
$
we get
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 1 2)
bef open sd(y): 0x7fb3b9dc6680 0x7fb3b9dc6698 2-6 0 2
("a"
1 2)
aft ftruncate sd(y); 0x7fb3b9dc6680 0x7fb3b9dc6698 2-6 0 2
("a"
1 2)
data: 1: `"file"
"file" 1: data
bef open sd(y): 0x7fb3b9dc66c0 0x7fb3b9dc66d8 3-6 0 2
("a"
1 2)
aft ftruncate sd(y); 0x7fb3b9dc66c0 0x7fb3b9dc66d8 3-6 0 2
("a";())
The corruption occurs between the open
statement and the ftruncate
statement.
However, trying to check the status of y
in between these statements (e.g., uncommenting the mid
statement, yields
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 1 2)
bef open sd(y): 0x7f76184e7680 0x7f76184e7698 2-6 0 2
("a"
1 2)
mid sd(y): 0x7f76184e7680 0x7f76184e7698 2-6 0 2
("a"
1 2)
aft ftruncate sd(y); 0x7f76184e7680 0x7f76184e7698 2-6 0 2
("a"
1 2)
data: 1: `"file"
"file" 1: data
bef open sd(y): 0x7f76184e76c0 0x7f76184e76d8 3-6 0 2
("a"
1 2)
mid sd(y): 0x7f76184e76c0 0x7f76184e76d8 3-6 0 2
Bus error (core dumped)
''A Bus error is trying to access memory that can't possibly be there. You've used an address that's meaningless to the system, or the wrong kind of address for that operation."
It's not yet clear why that would be the case between the open
and the ftruncate
statements.
To simplify further ... we get rid of the variable named data
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
1: `"file"
("a"
4 5)
"file" 1: ("a"; 4 5)
After each of the 3 commands, we examine file.K
using a different Linux terminal session:
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000 ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000 ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000 ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000 ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
$
After the 2nd execution of the first command, file.K
is corrupt.
At first, it looks like the dyadic function f 1: x
is the cause of the problem.
However, consider executing it 3 times with different x
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
"file" 1: ("b"; 6 7)
"file" 1: ("c"; 8 9)
and (again) checking the file after each execution
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6100 0000 0000 0000 ........a.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0400 0000 0000 0000 0500 0000 0000 0000 ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6200 0000 0000 0000 ........b.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0600 0000 0000 0000 0700 0000 0000 0000 ................
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: 0300 0000 0000 0000 6300 0000 0000 0000 ........c.......
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000050: ffff ffff ffff ffff 0200 0000 0000 0000 ................
00000060: 0800 0000 0000 0000 0900 0000 0000 0000 ................
$
The file does not become corrupt ...
It now appears that the problem with the dyadic function f 1: x
is caused by prior execution of the monadic function 1: f
,
although the contents did get loaded properly by the monadic function.
Some further analysis. I made some code changes to document what's happening. The results are:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5) //command 1
_1d _1d_write
** bef open sd(y): 0x7f7e1665c680 0x7f7e1665c698 2-6 0 2
("a"
4 5)
** aft open sd(y): 0x7f7e1665c680 0x7f7e1665c698 2-6 0 2
("a"
4 5)
mmap close munmap
:data: 1: `"file" //command 2
_1m open mmap _1m_r _1m_r _2m_r rrep _1m_r mmap close munmap
("a"
4 5)
"file" 1: data //command 3
_1d _1d_write
** bef open sd(y): 0x7f7e1665c880 0x7f7e1665c898 3-6 0 2
("a"
4 5)
** aft open sd(y): 0x7f7e1665c880 0x7f7e1665c898 3-6 0 2
Bus error (core dumped)
$
So ... what does it show?
open
, then mmap
, then close
, then munmap
(all on file
).open
, then mmap
, then mmap
(again), then close
, then munmap
.Bus error
occurs after open
, but BEFORE mmap
.Note:
file
was unmapped (in command 2), the open
file
(in command 3) corrupts a K-structure in memory. At this point, the file
should not be connected (i.e., mapped) to any structure in computer memory. It appears as though the mapping done in command 2 was not fully released.Summary:
mmap
does not appear to be working properly in Linux, MacOS or Windows.
On the face of it, mmap
appears to be an inappropriate strategy to implemnt 1:
However, it does seem a bit strange that:
mmap
twice, but calls munmap
only once.Further confirmation that the call mmap
twice, call munmap
once in command 2 may be related to the problem.
It works for 2 scalars:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4) //command 1
_1d _1d_write
** bef open sd(y): 0x7f76ca0a1680 0x7f76ca0a1698 2-6 0 2
("a";4)
** aft open sd(y): 0x7f76ca0a1680 0x7f76ca0a1698 2-6 0 2
("a";4)
mmap close munmap
:data: 1: `"file" //command 2
_1m open mmap _1m_r _1m_r _2m_r rrep _1m_r _2m_r rrep close munmap
("a";4)
"file" 1: data //command 3
_1d _1d_write
** bef open sd(y): 0x7f76ca0a1880 0x7f76ca0a1898 3-6 0 2
("a";4)
** aft open sd(y): 0x7f76ca0a1880 0x7f76ca0a1898 3-6 0 2
("a";4)
mmap close munmap
Note that in command 2, both mmap
and munmap
are called only once.
If we try 2 vectors ("ab";4 5)
, command 2 calls mmap
3 times and calls munmap
only once.
It also fails.
Here is a "plausible" theory on the cause of the problem.
Consider the case ("a"; 4 5)
where mmap
is called twice, and munmap
is called once.
The first time mmap
is called in _lm
, it creates map v
for length s
with address 0
if(MAP_FAILED==(v=mmap(0,s,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,0)))R SE;
The second time mmap
is called in _lm_r
it creates map u
for length length
with address 0
if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;}
However, when munmap
is called in _lm
it is only for map v
with length s
with address 0
r=munmap(v,s);
Map u
is never unmapped!
Note that this gets complex very quickly.
In the case of ("ab"; 4 5) the current code calls mmap
3 times, and calls munmap
only once for the first mapping.
Remember that the initial impetus for this issue was the Scrabble-Puzzle
It has hundreds of members and each member is of the form ("abcelpy"; 172 84 75 131 157 102 79)
Scrabble-puzzle would appear to call mmap
thousands of times, and munmap
only once for the very first map.
I used strace(1)
to see what syscalls are used for this case. I just did \l scrabble-puz
and quit.
scrabble-puz.k
file. None. After opening this file it makes 3 mmap calls but that seems to be for some other reason -- probably to allocate space to hold the puz array.My conclusion is that the whole memory allocation scheme should be reviewed. In general minimize calls to malloc/mmap by using better defaults and guesstimates.
Thanks! I agree with your conclusion.
These 4 simple code changes (as a test), do the following:
mmap
u
in _1m_r
(which causes the multitude of calls to mmap
)mmap
v
created in _1m
for retrieving the contents of file
data
in memory before and after munmap
$ git diff
diff --git a/src/0.c b/src/0.c
index 6ba3663..4eb272d 100644
--- a/src/0.c
+++ b/src/0.c
@@ -508,7 +508,9 @@ K _1m(K x) { //Keeps binary files mapped
I b=0;
K z = _1m_r(f,v,v,v+s,&b);
r=close(f); if(r)R FE;
@@ -539,11 +541,11 @@ Z K _1m_r(I f,V fixed, V v,V aft,Ib) { //File descriptor, moving into mmap, length+=mod; offset-=mod;
if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;}
//if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;} mMap+=length; mUsed+=length;if(mUsed>mMax)mMax=mUsed;
z=(K)(((V)u+mod)-3sizeof(I)); //3sizeof(I) for c,t,n
z=(K)(((V)v+32)-3sizeof(I)); //3sizeof(I) for c,t,n
//ref count should be reset to 1 after mapping mrc((K)z,1); $
The result shows:
putting ("a"; 4 5)
into `file
works fine
retrieving the contents of file
works fine
munmap
obliterates the data (in memory) that was obtained using mmap
.
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
:data: 1: `"file"
bef munmap: sd_(z,2) 0x7f1162d5d880 0x7f1162d5d898 1-6 0 2
("a"
4 5)
0x7f1162d5d898 0x7f1162d5d6c0 0x7f1162d5d6d8 1-6 3 1 "a"
0x7f1162d5d8a0 0x7f1162d27048 0x7f1162d27060 1-6 -1 2 4 5
aft munmap: sd_(z,2) 0x7f1162d5d880 0x7f1162d5d898 1-6 0 2
Segmentation fault (core dumped) $
Note that the Linux manual page for mmap(2) states
munmap() The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references.
The observed behavior in our test is consistent with the description in the Linux manual page.
It's not clear how k2.8 and k3.2 got around this.
Using strace
on k2.8, we find some differences.
k2.8:
O_RDONLY
on openmmap2
instead of mmap
MAP_PRIVATE
instead of MAP_PRIVATE|MAP_NORESERVE
munmap
at allalarm
, stat64
, llseek
and pselect6
data: 1: `"file"
) = 1 (in [0])
alarm(1) = 0
alarm(0) = 1
read(0, "data: 1: `\"file\"\n", 1024) = 17
alarm(1) = 0
stat64("file.l", {st_mode=S_IFREG|0664, st_size=56, ...}) = 0
openat(AT_FDCWD, "file.l", O_RDONLY) = 3
_llseek(3, 0, [56], SEEK_END) = 0
mmap2(NULL, 56, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xf7f8e000
close(3) = 0
write(2, " ", 2 ) = 2
alarm(0) = 1
pselect6(1000, [0], [], NULL, NULL, NULL
kona:
clock_gettime
and newfstatat
data: 1: `"file"
) = 1 (in [0])
read(0, "data: 1: `\"file\"\n", 1024) = 17
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=6827485}) = 0
openat(AT_FDCWD, "file.K", O_RDWR) = 3
newfstatat(AT_FDCWD, "file.K", {st_mode=S_IFREG|S_ISVTX|0775, st_size=112, ...}, 0) = 0
mmap(NULL, 112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7fc58d425000
close(3) = 0
munmap(0x7fc58d425000, 112) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fc58d425050} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
$
file.K
with O_RDWR is obviously wrong. IMHO this still doesn't get at the real problem. See my next message.
mmap/munmap behave as expected. The problem may be in how they are used in kona. k2.8/3.2 don't "get around" their semantics. There is certainly memory leak of some sort. That is easy to see if you do this:
scrabble-puz.k
but containing only a few rows. Let us call it x.k
-- this may be what you did with file.K
valgrind ./k
\l x
and exit. Valgrind will show you a leak of few hundred bytes (for example 405 bytes).valgrind k
again.\l x' and when it is loaded, reload it with
\l x` and exit again. Valgrind will show you a leak of twice the size (for example 810 bytes). Since it is the exact same array we are overwriting, there should be fixed size leak at mostk but here the leak size increases every time you load the same file!Another indication that allocation is not quite right. This may be different from the slowness issue which seems to be triggered by doing far too many small mmaps.
It appears that we may have found the culprit.
Make the following single line change to print the args for function strdupn
Note: there was already a warning in the comment that strdupn
can overallocate
$ git diff
diff --git a/src/ks.c b/src/ks.c
index 9810042..8e5f42d 100644
--- a/src/ks.c
+++ b/src/ks.c
@@ -8,7 +8,7 @@
Z I ns=0,sdd=0;
// Z S sdup(S s){R strdupn(s,strlen(s));} //using this because "strdup" uses [used] dynamically linked malloc which fails with our static free
Z S sdupI(S s){I k;S d=alloc(NSLOTS*sizeof(I)+(k=strlen(s))+1);if(!d)R 0;ns++;sdd=1;d+=NSLOTS*sizeof(I);d[k]=0;R memcpy(d,s,k);}
-S strdupn (S s,I k) {S d=alloc(k+1);if(!d)R 0;d[k]=0;R memcpy(d,s,k);} // mm/o (note: this can overallocate)
+S strdupn (S s,I k) {O("s:%s k:%lld\n",s,k); S d=alloc(k+1);if(!d)R 0;d[k]=0;R memcpy(d,s,k);} // mm/o (note: this can overallocate)
//I SC0N(S a,S b,I n) {I x=memcmp(a,b,n); R x<0?-1:x>0?1:a[n]?1:0; }// non-standard way to compare aaa\0 vs aaa
I strlenn(S s,I k){S t=memchr(s,'\0',k); R t?t-s:k;}
If we run valgrind
on this simple file
$ cat sp1.k
puz:(("abcdeht"
176 79 106 111 184 125 143))
$
We get 17 executions of strdupn
and 17 bytes definitely lost
kona \ for help. \\ to exit.
\l sp1.k
s:\l sp1.k
k:9
s:k k:1
s:puz:(("abcdeht"
k:16
s: 176 79 106 111 184 125 143))
k:31
s:k k:1
s:puz:(("abcdeht"
176 79 106 111 184 125 143)) k:3
s:puz k:3
s:puz k:3
s:176 79 106 111 184 125 143 k:4
s:79 106 111 184 125 143 k:3
s:106 111 184 125 143 k:4
s:111 184 125 143 k:4
s:184 125 143 k:4
s:125 143 k:4
s:143 k:3
\\
s:\\
k:3
s:k k:1
==54559== LEAK SUMMARY:
==54559== definitely lost: 17 bytes in 1 blocks
==54559== indirectly lost: 0 bytes in 0 blocks
==54559== possibly lost: 184 bytes in 9 blocks
==54559== still reachable: 9 bytes in 3 blocks
==54559== suppressed: 0 bytes in 0 blocks
If we use an even simpler file
cat sp1a.k
puz:(("abc"
4 5 6))
$
We get 13 executions of strdupn
, and 13 bytes definitely lost
kona \ for help. \\ to exit.
\l sp1a.k
s:\l sp1a.k
k:10
s:k k:1
s:puz:(("abc"
k:12
s: 4 5 6))
k:10
s:k k:1
s:puz:(("abc"
4 5 6)) k:3
s:puz k:3
s:puz k:3
s:4 5 6 k:2
s:5 6 k:2
s:6 k:1
\\
s:\\
k:3
s:k k:1
==54679== LEAK SUMMARY:
==54679== definitely lost: 13 bytes in 1 blocks
==54679== indirectly lost: 0 bytes in 0 blocks
==54679== possibly lost: 184 bytes in 9 blocks
==54679== still reachable: 9 bytes in 3 blocks
==54679== suppressed: 0 bytes in 0 blocks
Try this simple test: file1.k:
foo:(("abc";))
Run k under valgrind. Just load file1 and exit and check for leaks.
file2.k:
foo:(("abc"
))
Run k under valgrind. Just load file2 and exit. and check for leaks. They create identical objects but in the second case there is a leak!
I suspect this is a separate bug, not related to mmap but should be fixed.
Thanks !
Your examples provide further evidence that strdupn
may be the culprit.
When loading file1.k
under valgrind, strdupn
is not called at all.
When loading file2.k
under valgrind, strdupn
is called 8 times.
I agree with your suspicion that this bug is probably a separate issue, and probably not related to the mmap
problem.
Revising the last comment: I was looking at "possibly lost" and "definitely lost" for file1.k
.
strdupn
is never associated with any bytes "possibly lost" or "definietly lost " for file1.k
.
However, strdupn
is called:
strdupn
is called 9 times for file1.k
(with NO "possibly lost" and NO "definitely lost" bytes)strdupn
is called 10 times for file2.k
(with 13 bytes "definitely lost" assiciated with its use)I'm putting the memory leak when using \l file
into its own issue.
The \l file
command does not appear to invoke mmap
, and is probably not related to mmap
.
The focus of the mmap
problem should be the commands "file" 1: ("a"; 4 5)
and :data: 1: `"file"
,
i.e., 1: dayad
and 1: monad
FWIW, this doesn't crash any more on OS X (m1 and x86-64), FreeBSD & Linux. It still does far too many mmap calls, which should probably tracked under a separate issue.
It's not a crash that we are attempting to fix at this point ... rather, it's memory corruption ...
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5) //command 1
:data: 1: `"file" //command 2
("a"
4 5)
data //display
("a"
4 5)
"file" 1: data //command 3
data //display
("a";())
I'll try to check (later today) if this corruption still occurs on OS X, and on Windows.
No data corruption (in this case) on OS X
$ ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
:data: 1: "file"
("a"
4 5)
"file" 1: data
data
("a"
4 5)
Got a surprise in Windows:
tavme@DESKTOP-FVKENU9 MINGW64 ~/kona
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
:data: 1: "file"
("a"
4 5)
"file" 1: data
Invalid argument error
"file" 1: data
^
> \
data = ("a"; 4 5)
(1
1 1)
"file" 1: ("a"; 4 5)
Invalid argument error
"file" 1: ("a"; 4 5)
^
>
This problem in Windows is sufficiently different that I will open a new issue.
Progress in Linux.
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
data: 1: `"file"
"file" 1: data
data
("a"
4 5)
data: 1: `"file"
"file" 1: data
data
("a"
4 5)
Making the following changes (for the test case) eliminates both:
mmap
the memory corruption
$ git diff
diff --git a/src/0.c b/src/0.c
index 6ba3663..76de931 100644
--- a/src/0.c
+++ b/src/0.c
@@ -492,7 +492,7 @@ K _1m(K x) { //Keeps binary files mapped
I f=open(e,O_RDWR); //Try the extended version of the filename first
if(f>=0) stat(e,&c);
else {f=open(m,O_RDONLY); stat(m,&c);} //Then try the plain version free(e);
P(f<0,DOE) @@ -502,13 +502,12 @@ K _1m(K x) { //Keeps binary files mapped
S v; //These mmap arguments are present in Arthur's code. WRITE+PRIVATE lets reference count be modified without affecting file
if(MAP_FAILED==(v=mmap(0,s,PROT_READ|PROT_WRITE,MAP_PRIVATE,f,0)))R SE;
//TODO: verify that the file is valid K data. For -1,-2,-3 types (at least) you can avoid scanning the whole thing and check size I b=0; K z = _1m_r(f,v,v,v+s,&b); r=close(f); if(r)R FE;
@@ -531,7 +530,6 @@ Z K _1m_r(I f,V fixed, V v,V aft,Ib) { //File descriptor, moving into mmap, K z,x; if(0==t||5==t){z=newK(t,n); DO(n,x=_1m_r(f,fixed,v+r,aft,&r); if(!x){cd(z);R 0;} kK(z)[i]=x; ) } else { //map lists to file. atoms are allocated not mapped
S u; I length=r; I offset=v-fixed+(t>0?3:4)sizeof(I); //I mod = offset&(PG-1); //offset must be a multiple of the pagesize @@ -539,11 +537,10 @@ Z K _1m_r(I f,V fixed, V v,V aft,Ib) { //File descriptor, moving * into mmap, length+=mod; offset-=mod;
if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;} mMap+=length; mUsed+=length;if(mUsed>mMax)mMax=mUsed;
z=(K)(((V)u+mod)-3sizeof(I)); //3sizeof(I) for c,t,n
z=(K)(((V)v+32)-3sizeof(I)); //3sizeof(I) for c,t,n
//ref count should be reset to 1 after mapping mrc((K)z,1); @@ -575,7 +572,7 @@ Z K _1d_write(K x,K y,I dosync) { U(e)
//Largely copy-pasted from 6:dyadic
I f=open(e,O_RDWR|O_CREAT|O_TRUNC,07777);
I f=open(e,O_RDWR|O_CREAT,06666); free(e); P(f<0,SE)
(END)
Next step:
The above only works for the test case.
It needs to be generalized by using ```mod``` again in
Before we "generelize" the process of reading a file that has been serialized, we need to fix an additional problem in the creation of the serialized file.
In k2.8, the command
"file" 1: ("ab"; 4 5)
creates the file (in 32-bit representation):
$ xxd file.l
00000000: fdff ffff 0100 0000 0000 0000 0200 0000 ................
00000010: fdff ffff 0100 0000 fdff ffff 0200 0000 ................
00000020: 6162 0000 0000 0000 fdff ffff 0100 0000 ab..............
00000030: ffff ffff 0200 0000 0400 0000 0500 0000 ................
The 32-bit binary representation can be translated to
-3 1 0 2
-3 1 -3 2
ab 0 -3 1
-1 2 4 5
In kona, the same command creates the file (in 64-bit representation):
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000040: 6162 00fd ffff ffff ffff ff01 0000 0000 ab..............
00000050: 0000 00ff ffff ffff ffff ff02 0000 0000 ................
00000060: 0000 0004 0000 0000 0000 0005 0000 0000 ................
00000070: 0000 00 ...
The 64-bit binary representation should be translatable to the same result, but it's not. We get the start correctly ...
-3 1
0 2
-3 1
-3 2
but then there is a problem in line 00000040
To get ab 0
line 00000040 should be
00000040: 6162 0000 0000 0000 0000 0000 0000 0000 ab..............`
In kona, the file creation process does not include correct padding at the end of a character array. The k2.8 (32-bit) file is 64 bytes. The kona (64-bit) file should be 128 bytes. It is only 115 bytes.
However, the "padding" at the end of a character array in k2.8 seems inconsistent: Consider the command
"file" 1: ("ab")
k2.8:
$ xxd file.l
00000000: fdff ffff 0100 0000 fdff ffff 0200 0000 ................
00000010: 6162 00 ab.
kona:
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000020: 6162 00 ab.
No "padding" in either case. But ... the k2.8 file is 19 bytes. The kona file is only 35 bytes. I would have expected 38 bytes. 35 may be OK in this case, as the start of each element is aligned on a double word (k2.8) or a quad word (kona) boundary.
This suggests that a character array is not automatically "padded". "Padding" is added at the beginning of the next element (if there is one) for proper alignment.
k2.8 seems even more inconsistent. Consider
"file" 1: ("ab";"cd")
k2.8
$ xxd file.l
00000000: fdff ffff 0100 0000 0000 0000 0200 0000 ................
00000010: fdff ffff 0100 0000 fdff ffff 0200 0000 ................
00000020: 6162 0000 0000 0000 fdff ffff 0100 0000 ab..............
00000030: fdff ffff 0200 0000 6364 0000 0000 0000 ........cd......
kona:
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0200 0000 0000 0000 ................
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000030: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000040: 6162 00fd ffff ffff ffff ff01 0000 0000 ab..............
00000050: 0000 00fd ffff ffff ffff ff02 0000 0000 ................
00000060: 0000 0063 6400 ...cd.
k2.8: Full (even extra double word) "padding" at the end of the first and the final character array. kona: Minimal "padding" for both. Quad word misalignment for all elements that follow the first character array.
It might be better for kona to always add "padding" at the end of a character array.
More importantly, kona uses "mmap" to write ("ab","cd") to "file". Since in k2.8 we have "6162 0000 0000 0000" then either
Using the last commit of kona to github (May 3, 2022) below is a comparison of file.K (created by kona) and file.l (created by k2.8)
"file" 1: (1;1.0;"c";`d;1 2;3.0 4.0;"ef";`g`h;();(1;`z))
$ xxd file.K xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 00000000: fdff ffff 0100 0000
00000010: 0000 0000 0000 0000 0b00 0000 0000 0000 0000 0000 0b00 0000
00000020: fdff ffff ffff ffff 0100 0000 0000 0000 00000010: fdff ffff 0100 0000
00000030: 0100 0000 0000 0000 0100 0000 0000 0000 0100 0000 0100 0000 1
00000040: fdff ffff ffff ffff 0100 0000 0000 0000 00000020: fdff ffff 0100 0000
00000050: 0200 0000 0000 0000 0000 0000 0000 f03f 0200 0000 0100 0000
00000030: 0000 0000 0000 f03f 1.0
00000060: fdff ffff ffff ffff 0100 0000 0000 0000 fdff ffff 0100 0000
00000070: 0300 0000 0000 0000 6300 0000 0000 0000 00000040: 0300 0000 6300 0000 “c”
00000080: fdff ffff ffff ffff 0100 0000 0000 0000 fdff ffff 0100 0000
00000090: 0400 0000 0000 0000 6400 0000 0000 0000 00000050: 0400 0000 6400 9ff7 `d
000000a0: fdff ffff ffff ffff 0100 0000 0000 0000 fdff ffff 0100 0000
000000b0: ffff ffff ffff ffff 0200 0000 0000 0000 00000060: ffff ffff 0200 0000
000000c0: 0100 0000 0000 0000 0200 0000 0000 0000 0100 0000 0200 0000 1 2
000000d0: fdff ffff ffff ffff 0100 0000 0000 0000 00000070: fdff ffff 0100 0000
000000e0: feff ffff ffff ffff 0200 0000 0000 0000 feff ffff 0200 0000
000000f0: 0000 0000 0000 0840 0000 0000 0000 1040 00000080: 0000 0000 0000 0840 0000 0000 0000 1040 3.0 4.0
00000100: fdff ffff ffff ffff 0100 0000 0000 0000 00000090: fdff ffff 0100 0000
00000110: fdff ffff ffff ffff 0200 0000 0000 0000 fdff ffff 0200 0000
00000120: 6566 00fd ffff ffff ffff ff01 0000 0000 000000a0: 6566 0000 0000 0000 fdff ffff 0100 0000 “ef”
00000130: 0000 00fc ffff ffff ffff ff02 0000 0000 000000b0: fcff ffff 0200 0000
00000140: 0000 0067 0068 00fd ffff ffff ffff ff01 6700 6800 0000 0000 `g`h
00000150: 0000 0000 0000 0000 0000 0000 0000 0000 000000c0: fdff ffff 0100 0000
00000160: 0000 0000 0000 00fd ffff ffff ffff ff01 0000 0000 0000 0000 ()
00000170: 0000 0000 0000 0000 0000 0000 0000 0002 000000d0: fdff ffff 0100 0000 0000 0000 0200 0000
00000180: 0000 0000 0000 00fd ffff ffff ffff ff01 000000e0: fdff ffff 0100 0000
00000190: 0000 0000 0000 0001 0000 0000 0000 0001 0100 0000 0100 0000
000001a0: 0000 0000 0000 00fd ffff ffff ffff ff01 000000f0: fdff ffff 0100 0000 (1;
000001b0: 0000 0000 0000 0004 0000 0000 0000 007a 0400 0000 7a00 9ef7 `z)
000001c0: 0000 0000 0000 00
There are 2 apparent problems:
fdff 0100 0200 0100 f03f
In kona on line 00000040 (in shortened form) it is only fdff 0100 0200 f03f
fdff
or feff
or ffff
begins on a new line, until line 00000120. After that, it gets totally messed up.In my opinion, in kona, the first may not need to be fixed, but the second must be fixed.
After fixing issues #629 and #630, this issue now appears resolved in Linux. I haven't yet tried it in OSX, nor in Windows, so I'm keeping this issue open till I do.
In Linux:
\l scrabble-puz.k
#puz
8473
`"scrabblepuz" 1: puz
puz2: 1: `"scrabblepuz"
#puz2
8473
&/puz=puz2
(1 1 1 1 1 1 1
1 1 1 1 1 1 1)
It works in OSX.
It fails in Windows:
MINGW64 ~/kona
$ ./k
kona \ for help. \\ to exit.
\l scrabble-puz.k
#puz
8473
`"scrabblepuz" 1: puz
puz2: 1: `"scrabblepuz"
Segmentation fault
However, both #629 and #630 work in Windows.
There is something else that is strange (problematic) in Windows
In Linux, the 1st command creates file.K and the 2nd command leaves file.K untouhed.
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a";4 5)
\\
$ ls -l file.K
-rwxr-xr-t. 1 tom tom 112 Aug 25 23:12 file.K
$ date
Thu Aug 25 11:13:01 PM EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
("a"
4 5)
\\
$ ls -l file.K
-rwxr-xr-t. 1 tom tom 112 Aug 25 23:12 file.K
$
In Windows, the 1st command creates file.l and the 2nd command rewrites file.l, (and with a different size).
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a";4 5)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 112 Aug 25 23:17 file.l
$ date
Thu Aug 25 23:18:17 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
("a"
4 5)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 144 Aug 25 23:18 file.l
$
In OSX, (like Linux) the file is created by the first command, and not rewritten by the 2nd command:
$ ./k
kona \ for help. \\ to exit.
"file" 1: ("a"; 4 5)
\\
$ ls -l file.l
-rwxr-xr-x 1 thomasszczesny staff 112 Aug 26 15:43 file.l
$ date
Fri Aug 26 15:44:46 EDT 2022
$ ./k
kona \ for help. \\ to exit.
1: "file"
("a"
4 5)
\\
$ ls -l file.l
-rwxr-xr-x 1 thomasszczesny staff 112 Aug 26 15:43 file.l
$
I wanted to check that the problems in Windows for #615 were not the result of recent fixes for Linux and OSX over the last year. So, I reverted the HEAD to the state of the commit made on Oct 26, 2020. The problem exists back then.
$ git checkout 046e3a780cc3d43a109607a5730cad26c5ad3b2d
HEAD is now at 046e3a7 these 'fixes' no longer seem necessary
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a";4 5)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 112 Aug 27 22:38 file.l
$ date
Sat Aug 27 22:39:07 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
("a"
4 5)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 144 Aug 27 22:39 file.l
``
Interesting ... the problem does not exist for this simpler case
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("a")
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 22:55 file.l
$ date
Sat Aug 27 22:56:12 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
"a"
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 22:55 file.l
The Windows problem does not exist for this simple case:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: (4)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 23:05 file.l
$ date
Sat Aug 27 23:06:08 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
4
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 32 Aug 27 23:05 file.l
The Windows problem does exist for this simple case:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: (4 5)
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 48 Aug 27 23:03 file.l
$ date
Sat Aug 27 23:04:09 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
4 5
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 80 Aug 27 23:04 file.l
And, the Windows problem does exist for this simple case:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("ab")
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 40 Aug 27 23:14 file.l
$ date
Sat Aug 27 23:15:01 EDT 2022
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
"ab"
\\
$ ls -l file.l
-rw-r--r-- 1 tavme tavme 72 Aug 27 23:15 file.l
The problem (in Windows) begins with the "mmap" command. Making only the following addition:
--- a/src/0.c
+++ b/src/0.c
@@ -540,6 +540,7 @@ Z K _1m_r(I f,V fixed, V v,V aft,I*b) { //File descriptor, moving * into mmap,
offset-=mod;
if(MAP_FAILED==(u=mmap(0,length,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_NORESERVE,f,offset))){R SE;}
+ exit(0);
mMap+=length;
mUsed+=length;if(mUsed>mMax)mMax=mUsed;
We get this result
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("ab")
\\
$ xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000020: 6162 0000 0000 0000 ab......
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
$ xxd file.l
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000020: 6162 0000 0000 0000 0000 0000 0000 0000 ab..............
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 ........
In Linux (with the same modification) we get:
$ rlwrap -n ./k
kona \ for help. \\ to exit.
"file" 1: ("ab")
\\
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000020: 6162 0000 0000 0000 ab......
$ rlwrap -n ./k
kona \ for help. \\ to exit.
1: "file"
$ xxd file.K
00000000: fdff ffff ffff ffff 0100 0000 0000 0000 ................
00000010: fdff ffff ffff ffff 0200 0000 0000 0000 ................
00000020: 6162 0000 0000 0000 ab......
Windows problem with mmap fixed in commit of Aug 28, 2022
This is awesome Tom! Thanks very much.
Thanks for identifying the issue. I wasn't confident that we were done here till I just tested it (on Linux).
\l scrabble-puz.k
#puz
8473
"scrabblepuz" 1: puz
puz2: 1: "scrabblepuz"
#puz2
8473
^/^/^/puz=puz2
1.0
Tested it in OSX ... works. Tested in Windows ... works.
If I rename scrabble-puz.txt to scrabble-puz.k, I can load it as a script. If I try to serialize the data with 1:, it seems to work just fine, but crashes when I attempt to deserialize it.
But the behavior seems inconsistent. During another run it seemed to make it most of the way through but the data appeared corrupted (floats where there should be ints) and it eventually crashed. Here's a script of the session: crash.script.txt
Here's just one of the lines seemingly corrupted:
Where this is what that line looks like in scrabble-puz.k: