Improve concurrent sequential access speed by per-file file pointer.

tmhannes commented 2 years ago

This PR addresses #181 by adding a separate "fptr" for each open file handle, so that sequential reads never have to restart from the beginning of the file.

In light testing on an i7 CPU with an external SSD, the PR allows concurrent readers at different positions in a large file to achieve the same total read throughput and CPU usage as a single reader sequentially processing the whole file.

Any feedback would be gratefully received.

relan commented 2 years ago

I like the idea, multi-thread operations are indeed suboptimal. Thanks for your proposal!

I have a few concerns about the implementation:

Why exfat_fptrs need to be chained?
The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

relan commented 2 years ago

I like the idea, multi-thread operations are indeed suboptimal. Thanks for your proposal!

I have a few concerns about the implementation:

Why exfat_fptrs need to be chained?
The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

tmhannes commented 2 years ago

Why exfat_ptrs need to be chained

The exfat_ptrs are chained so that shrink_file and grow_file (both called from exfat_truncate) can find all of the exfat_ptrs that point to the file they are working with (by following the chain from the fptr member of exfat_node), so that they can adjust them when necessary.

My impression is that this is required when, for example, a process A has a file open and then a process B truncates it, so that process A is not left holding an invalid exfat_ptr.

The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

It's not necessary to pass the exfat_fptr to truncate at all, because truncate will find all the exfat_ptrs that exist by itself (see above).

I agree that the optional exfat_ptr argument to exfat_advance_cluster is weird though. Perhaps it would be cleaner to require every caller to provide an exfat_ptr (or an exfat_fh (which contains both exfat_ptr and exfat_node))?

I didn't pursue that option because I wanted to keep the patch as small as possible. Requiring every caller of exfat_advance_cluster to provide an exfat_ptr would cause changes rippling out into a few more sections of code.

relan / exfat

Improve concurrent sequential access speed by per-file file pointer. #182