markbt / streampager

A pager for command output or large files
MIT License
45 stars 11 forks source link

Implement new file reader #16

Closed markbt closed 4 years ago

markbt commented 4 years ago

Implement a new reader for on-disk files, that reads the data out of the file rather than mmapping it.

It's a bit simple right now, so it reloads each line from the file whenever it wants to display it. It would be better to go through some kind of LRU block cache for loading chunks of the file. The cache can be flushed whenever we detect a reload is necessary.

The heuristic for append-vs-reload is whether the last 4k of the file has changed or not. Also, any time we try to parse a line and fine a newline in the middle of the line, we trigger a full reload.

This should help with #8 and #9, as we now watch the file and load new contents if the file is appended to, or reload the file if the contents change or the file is truncated.

The mmap implementation is retained in case it will be useful, but for now is unused.

quark-zju commented 4 years ago

I wasn't aware of this. I did something similar at https://github.com/quark-zju/streampager/commit/305b3ac7873a4fadf17c80a073f38d25a822c64a. It has some caching support.

jsgf commented 4 years ago

How much does/would caching help? The kernel is already doing all that for you, so all you're saving is the cost of the syscalls themselves. There can't be that many of them since you'd be limited by the user's reading rate (and bulk operations like search can amortize the cost with large reads).

quark-zju commented 4 years ago

I'm not sure. Each line will trigger a read call. It could be 100+ lines. Search also seems to trigger one read per line due to the current API design.

markbt commented 4 years ago

I added cache support, so now this is probably ready to go. Any input welcome before I merge it.

The notify behaviour seems a bit unreliable, but when it works, it's quite nice.