Use mmap - Githubissues

lks9 commented 1 year ago

Currently, the trace is stored in a 4K buffer and written when it is full. So we need an if clause to check wether it is full every time we write the trace. See _TRACE_PUT(c) in src_tracer.h.

Instead, we could use ftruncate and mmap to map one page of memory directly to the trace file output. Then writing the file is done by the os automatically when assigning something to a mapped memory address. Error handling (when the file is to small for the trace) should be done via signal handling (SIGBUS and SIGSEGV).

However, it is still unclear: Would it really make tracing faster? We can spare one if-clause on one side, but writing byte by byte to a mapped region might still be slower than status quo, a write syscall every 4K bytes.

lks9 commented 1 year ago

Ok, a different approach, because mmap might actually not be as so fast and write is a lot easier to handle:

Reserve memory for the full trace but allocate only the first page for the trace buffer and forbid write access to all following pages. Then whenever the trace position reaches the next page, we get a SIGSEGV. Within the signal handler, we write the page to the filesystem, free the page and allocate the next page. Then the next page becomes the current page. We can continue to write the trace to the current page, until we reach the next page and so on...

lks9 commented 1 year ago

I run a few tests (for writing a file of size 3GB):

Current approach

real    0m7,435s
user    0m5,865s
sys 0m1,383s

ANON mmap with writing in SIGSEGV trap is slower than the current approach
```
real    0m9,659s
user    0m5,330s
sys 0m4,141s
```
mmap with SIGSEGV trap is slower than the current approach
mmap with SIGBUS trap is slower than the current approach
```
real    0m15,917s
user    0m5,184s
sys 0m10,347s
```
mmap with big ftruncate (when we know the trace size beforehand its possible!) and without trap is as fast as the current approach
```
real    0m8,660s
user    0m6,792s
sys 0m1,712s
```
The only thing is that is was not a very hard computation... But still, I think we should stay with the current approach.

Source: heavy.zip

Side note: The problem seams to be the overhead by the signal trap. There is some linux kernel patch making it work without a trap: https://lwn.net/Articles/860419/. If it works with MAP_SHARED, there is still a chance that it makes writing faster. But I won't try patching the kernel just for that.

lks9 commented 1 year ago

Ok, I found out that the last approach (with big ftruncate) was actually faster for real software.

lks9 / src-tracer

Use mmap #33