mirage / irmin

Irmin is a distributed database that follows the same design principles as Git
https://irmin.org
ISC License
1.84k stars 153 forks source link

irmin-pack.unix: Use asynchronous unlink when cleaning up chunks in Gc. #2221

Closed adatario closed 1 year ago

adatario commented 1 year ago

First try to tackle #2220.

This adds Io.unlink_async which calls Lwt_unix.unlink.

Lwt maintains a pool of system threads where the unlinks will be called, thus not blocking the main application thread.

I'm not sure if this will provide any real benefits as Gc is still waited on by Async.await.

art-w commented 1 year ago

In case this is not enough, we don't actually need to wait for the unlinks to terminate (we also don't need the files to be deleted in a specific order, so perhaps with Lwt.async assuming the unlinking doesn't crash?) (If it does fail, then the store will anyway attempt to cleanup any residual files when re-opened so it shouldn't be an issue)

adatario commented 1 year ago

I've run a benchmark to see how this performs (https://github.com/tarides/irmin-tezos-benchmarking/tree/async-unlink/benchmarks/2023-03-async-unlink).

Tail latencies (compared to 3.7-mvp):

 | tail latency (1)         |   1.430 s |   1.156 s  81%

Slightly better. I think we can improve more by doing multiple unlinks at the same time. Currently all old chunks are unlinked sequentially in a Lwt_list.fold_s.

Slightly suprising, the disk usage:

stats

Maybe Lwt is doing something funky?

adatario commented 1 year ago

In case this is not enough, we don't actually need to wait for the unlinks to terminate (we also don't need the files to be deleted in a specific order, so perhaps with Lwt.async assuming the unlinking doesn't crash?)

Absolutely, I'll run another benchmark run where the unlinks just happen "fire and forget".

Maybe using Lwt.dont_wait would be safer and close to current semantics (just log a warning if unlink fails).

adatario commented 1 year ago

Maybe Lwt is doing something funky?

Seems like the disk usage watch script is trying to compute size of store.control.tmp while it is being removed:

du: cannot access 'stats/store/store.control.tmp': No such file or directory

When du fails the disk usage is reported as 0. That explains the weird spikes.

Seems to be related to #2206.

codecov-commenter commented 1 year ago

Codecov Report

Merging #2221 (21405bd) into main (8eeffae) will decrease coverage by 0.02%. The diff coverage is 80.95%.

:exclamation: Current head 21405bd differs from pull request most recent head 2504c40. Consider uploading reports for the commit 2504c40 to get more accurate results

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #2221      +/-   ##
==========================================
- Coverage   68.09%   68.07%   -0.02%     
==========================================
  Files         135      135              
  Lines       16480    16492      +12     
==========================================
+ Hits        11222    11227       +5     
- Misses       5258     5265       +7     
Impacted Files Coverage Δ
src/irmin-pack/unix/io.ml 65.25% <66.66%> (+0.03%) :arrow_up:
src/irmin-pack/unix/gc.ml 74.76% <82.05%> (-1.77%) :arrow_down:

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

art-w commented 1 year ago

Regarding du, perhaps this script should ignore the exit code? (du will still prints the observed size and some warnings on stderr, but we only have to default to 0 if its output was totally empty)

metanivek commented 1 year ago

Thanks for tackling this! I think the key will be to not bind the promise of Io.unlink_async since the bind will wait for the system thread to finish.

Perhaps this can be wrapped up inside of the Io module with something like (or with a custom exn handler so callers can do whatever they like):

let unlink_async path =
  let log_exn = (* log exn :) *) in
  let lwt_unlink = fun () -> Lwt_unix.unlink path in
  Lwt.dont_wait lwt_unlink log_exn
end
adatario commented 1 year ago

And the results are in for the unlink-and-forget version:

3.7-mvp async-unlink-and-forget async-unlink
-- main metrics --
CPU time elapsed 106m26s 109m09s 103% 111m17s 105%
Wall time elapsed 106m26s 109m13s 103% 111m43s 105%
TZ-transactions per sec 720.394 702.471 98% 688.916 96%
TZ-operations per sec 4705.452 4588.380 98% 4499.845 96%
Context.add per sec 13815.712 13471.978 98% 13212.028 96%
tail latency (1) 1.430 s 0.833 s 58% 1.156 s 81%
-- resource usage --
disk IO (total)
IOPS (op/sec) 176_654 172_261 98% 168_937 96%
throughput (bytes/sec) 17.635 M 17.196 M 98% 16.864 M 96%
total (bytes) 112.609 G 112.610 G 100% 112.610 G 100%
disk IO (read)
IOPS (op/sec) 176_559 172_168 98% 168_846 96%
throughput (bytes/sec) 10.375 M 10.117 M 98% 9.922 M 96%
total (bytes) 66.250 G 66.251 G 100% 66.251 G 100%
disk IO (write)
IOPS (op/sec) 95 93 98% 91 96%
throughput (bytes/sec) 7.260 M 7.079 M 98% 6.943 M 96%
total (bytes) 46.359 G 46.359 G 100% 46.359 G 100%
max memory usage (bytes) 0.376 G 0.381 G 101% 0.380 G 101%
mean CPU usage 100% 100% 100%

Seems to work.

Perhaps this can be wrapped up inside of the Io module with something like (or with a custom exn handler so callers can do whatever they like):

let unlink_async path =
  let log_exn = (* log exn :) *) in
  let lwt_unlink = fun () -> Lwt_unix.unlink path in
  Lwt.dont_wait lwt_unlink log_exn
end

That's pretty much the logic we have now. Wrapping it in the Io module is nice and would also make the refactor lighter as the signature of unlink and unlink_async is the same. I'll push an updated commit.

Regarding du, perhaps this script should ignore the exit code? (du will still prints the observed size and some warnings on stderr, but we only have to default to 0 if its output was totally empty)

Yes, good point! We are a bit too conservative with du failures. I'll update the script for the next run.

metanivek commented 1 year ago

I guess my main thought is that moving it into Io lets us get rid of the promise binds (like https://github.com/mirage/irmin/pull/2221/files#diff-cbf0c1b5f48f95727e34a0ca9b187a8690d8d435166cd1c56de9c0d42515753cR157) that block the main thread while waiting for an unlink to finish. It should also clean up the loop that unlinks chunks since no promises are involved.

Once we do that, I expect the tail latency to be around the 200ms or so that you observed in the benchmark that did not unlink files.

adatario commented 1 year ago

Once we do that, I expect the tail latency to be around the 200ms or so that you observed in the benchmark that did not unlink files.

You're right:

3.7-mvp unlink-dont-wait async-unlink-and-forget async-unlink
-- main metrics --
CPU time elapsed 106m26s 108m43s 102% 109m09s 103% 111m17s 105%
Wall time elapsed 106m26s 108m41s 102% 109m13s 103% 111m43s 105%
TZ-transactions per sec 720.394 705.257 98% 702.471 98% 688.916 96%
TZ-operations per sec 4705.452 4606.581 98% 4588.380 98% 4499.845 96%
Context.add per sec 13815.712 13525.416 98% 13471.978 98% 13212.028 96%
tail latency (1) 1.430 s 0.234 s 16% 0.833 s 58% 1.156 s 81%
-- resource usage --
disk IO (total)
IOPS (op/sec) 176_654 172_948 98% 172_261 98% 168_937 96%
throughput (bytes/sec) 17.635 M 17.265 M 98% 17.196 M 98% 16.864 M 96%
total (bytes) 112.609 G 112.612 G 100% 112.610 G 100% 112.610 G 100%
disk IO (read)
IOPS (op/sec) 176_559 172_855 98% 172_168 98% 168_846 96%
throughput (bytes/sec) 10.375 M 10.157 M 98% 10.117 M 98% 9.922 M 96%
total (bytes) 66.250 G 66.252 G 100% 66.251 G 100% 66.251 G 100%
disk IO (write)
IOPS (op/sec) 95 93 98% 93 98% 91 96%
throughput (bytes/sec) 7.260 M 7.107 M 98% 7.079 M 98% 6.943 M 96%
total (bytes) 46.359 G 46.359 G 100% 46.359 G 100% 46.359 G 100%
max memory usage (bytes) 0.376 G 0.387 G 103% 0.381 G 101% 0.380 G 101%
mean CPU usage 100% 100% 100% 100%

The unlink-dont-wait uses the most recently pushed version, which adds an unlink_dont_wait to the Io module:

  let unlink_dont_wait ~on_error path =
    Lwt.dont_wait (fun () -> Lwt_unix.unlink path) on_error

And uses this from Gc.