kevinlawler / kona

Open-source implementation of the K programming language
ISC License
1.36k stars 138 forks source link

Data load takes longer in kona than in k2.8 (mmap?) #632

Open tavmem opened 1 year ago

tavmem commented 1 year ago

Using scrabble-puz.k from issue #615 in kona:

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  \t \l scrabble-puz.k
8263

In k2.8

$ rlwrap -n ./k
K 2.8 2000-10-10 Copyright (C) 1993-2000 Kx Systems 
\ for help. \\ to exit.
  \t \l scrabble-puz.k
32
tavmem commented 1 year ago

On a related note, the second step in issue #615 takes more than 40 times longer in kona than in k2.8 In kona it took 177 milliseconds

 rlwrap -n ./k
kona      \ for help. \\ to exit.

  \l scrabble-puz.k
  #puz
8473
  "scrabblepuz" 1: puz
  \t puz2: 1: `"scrabblepuz"
177
  #puz2
8473
  ^/^/^/puz=puz2
1.0

In k2.8, it took 4 milliseconds

$ rlwrap -n ./k
K 2.8 2000-10-10 Copyright (C) 1993-2000 Kx Systems
\ for help. \\ to exit.

  \l scrabble-puz.k
  #puz
8473
  "scrabblepuz" 1: puz
  \t puz2: 1: "scrabblepuz"
4
  #puz2
8473
  ^/^/^/puz=puz2
1.0

Bakul mentioned that kona appears to be using mmap many more times than necessry. That may be the reason for the slowness in one, or in both of these cases.

bakul commented 1 year ago

See https://github.com/kevinlawler/kona/issues/615#issuecomment-1119669496 You can try running each k under strace (on linux) with a much smaller list.

tavmem commented 1 year ago

Thanks!
Looks like you are correct about the number of mmap calls. On the simple command "file" 1: (4 5) kona calls mmap 3 times k2.8 calls mmap2 once

kona:

"file" 1: (4 5)
) = 1 (in [0])
newfstatat(0, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
read(0, "\"file\" 1: (4 5)\n", 1024)    = 16
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=2017642}) = 0
mmap(NULL, 256, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8aefd81000
mmap(NULL, 128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8aefd80000
openat(AT_FDCWD, "file.K", O_RDWR|O_CREAT|O_TRUNC, 07777) = 3
ftruncate(3, 48)                        = 0
mmap(NULL, 48, PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f8aefd7f000
close(3)                                = 0
munmap(0x7f8aefd7f000, 48)              = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=2805125}) = 0
write(1, "  ", 2  )                       = 2
pselect6(1, [0], NULL, NULL, NULL, NULL

k2.8

"file" 1: (4 5)
) = 1 (in [0])
alarm(1)                                = 0
alarm(0)                                = 1
statx(0, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0620, stx_size=0, ...}) = 0
read(0, "\"file\" 1: (4 5)\n", 1024)    = 16
alarm(1)                                = 0
stat64("file.l", {st_mode=S_IFREG|0644, st_size=56, ...}) = 0
unlink("file.l")                        = 0
openat(AT_FDCWD, "file.l", O_RDWR|O_CREAT, 0666) = 3
ftruncate(3, 24)                        = 0
mmap2(NULL, 24, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xf7f2c000
close(3)                                = 0
msync(0xf7f2c000, 24, MS_SYNC)          = 0
munmap(0xf7f2c000, 24)                  = 0
write(2, "  ", 2  )                       = 2
alarm(0)                                = 1
pselect6(1000, [0], [], NULL, NULL, NULL
tavmem commented 1 year ago

For the associated command 1: "file", that follows the command "file" 1: (4 5) kona calls mmap twice. k2.8 calls mmap2 only once.

kona:

1:"file"
) = 1 (in [0])
read(0, "1:\"file\"\n", 1024)           = 9
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=3016329}) = 0
openat(AT_FDCWD, "file.K", O_RDWR)      = 3
newfstatat(AT_FDCWD, "file.K", {st_mode=S_IFREG|S_ISVTX|0755, st_size=48, ...}, 0) = 0
mmap(NULL, 48, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7f8aefd7f000
newfstatat(3, "", {st_mode=S_IFREG|S_ISVTX|0755, st_size=48, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 48, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7f8aefd7e000
close(3)                                = 0
munmap(0x7f8aefd7f000, 48)              = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=3776268}) = 0
write(1, "4 5\n", 44 5
)                    = 4
munmap(0x7f8aefd7e000, 48)              = 0
write(1, "  ", 2  )                       = 2
pselect6(1, [0], NULL, NULL, NULL, NULL

k2.8

1: "file"
) = 1 (in [0])
alarm(1)                                = 0
alarm(0)                                = 1
read(0, "1: \"file\"\n", 1024)          = 10
alarm(1)                                = 0
stat64("file.l", {st_mode=S_IFREG|0644, st_size=24, ...}) = 0
openat(AT_FDCWD, "file.l", O_RDONLY)    = 3
_llseek(3, 0, [24], SEEK_END)           = 0
mmap2(NULL, 24, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xf7f2c000
close(3)                                = 0
statx(1, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0620, stx_size=0, ...}) = 0
write(1, "4 5", 34 5)                      = 3
write(1, "\n", 1
)                       = 1
munmap(0xf7f2c000, 24)                  = 0
write(2, "  ", 2  )                       = 2
alarm(0)                                = 1
pselect6(1000, [0], [], NULL, NULL, NULL
tavmem commented 1 year ago

I just realized that this issue replicates the problem identified in issue #626 This issue does identify a 3rd culprit for the slowness (too many calls to mmap). However, I'm closing this issue, since it is redundant.

tavmem commented 1 year ago

As a reminder (see issue #626): after fixing the placement of syntaxChk the kona result is

$ rlwrap -n ./k
kona      \ for help. \\ to exit.
  \t \l scrabble-puz.k
175

The result with k2.8

$ rlwrap -n ./k
K 2.8 2000-10-10 Copyright (C) 1993-2000 Kx Systems 
\ for help. \\ to exit.
  \t \l scrabble-puz.k
26

The most likely explanation is that kona uses mmap more than it needs to.