ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
528 stars 67 forks source link

Add symlink support #715

Closed patricoferris closed 2 months ago

patricoferris commented 3 months ago

As part of #510 this PR adds symlink support to Eio_posix and Eio_linux with a stubbed out implementation on Windows. I updated all the uses of Unix.symlink in the FS tests to use Eio.Path.symlink as a test -- definitely worth checking I didn't mess up the arguments I always ln incorrectly!

patricoferris commented 3 months ago

Ah! Glad I added the caveat to check that -- thanks ^^" !

I've updated everything, I think, to be correct now. I went for link_to as the name, happy to change that if there are better suggestions:

val symlink : ?exists_ok:bool -> link_to:string -> _ t -> unit
patricoferris commented 3 months ago

Removing exists_ok seems fine to me. However, whilst testing I moved machines and now the linux tests are hanging for me. In particular, it is in test_statx that the second copy_string is causing the process to enter some uninterruptible sleep. The machine is using ZFS... maybe that's making it unhappy... but just thought I'd check to see if it sounds familiar/kernel version problem etc.

let test_statx () =
  let module X = Uring.Statx in
  Eio_main.run @@ fun env ->
  let ( / ) = Eio.Path.( / ) in
  let path = env#cwd / "test3.data" in
  Eio.Path.with_open_out path ~create:(`Exclusive 0o600) @@ fun file ->
  Eio.Flow.copy_string "hello" file;
  Eio.Flow.copy_string "+" file

strace: https://gist.github.com/patricoferris/ace0a69433f299ee4ac94368028bb96b

6.5.0-14-generic #14~22.04.1-Ubuntu

talex5 commented 3 months ago

By uninterruptable, you mean Ctrl-C doesn't kill it? The machine has to be rebooted to remove the stuck process? dmesg or /proc/PID/stack might show something.

Does it happen if you just run that single test as a complete program? e.g. this program works for me:

let () =
  Eio_main.run @@ fun env ->
  let ( / ) = Eio.Path.( / ) in
  let path = env#cwd / "test3.data" in
  Eio.Path.with_open_out path ~create:(`Exclusive 0o600) @@ fun file ->
  Eio.Flow.copy_string "hello" file;
  Eio.Flow.copy_string "+" file

If it still hangs without this PR, we should move it to another issue.

@koonwen: can your tools help debug this?

koonwen commented 3 months ago

The tool still needs some work. In the meantime though, I can recommend using bpftrace along with this gist to get a "strace-like" real-time view on the io-uring tracepoints being hit. This might give you a clue on what's happening under the hood.

sudo apt-get install bpftrace
wget https://gist.githubusercontent.com/koonwen/b69867422bc49bd10cf61932e41da362/raw/2f673e1aaf928e1531789c27f8d67869f878cba5/bpftrace_uring_trace.bt
sudo bpftrace bpftrace_uring_trace.bt
talex5 commented 3 months ago

@koonwen thanks - that prints a lot of info!

I had a go this morning at getting bpftrace to show probes called by uring workers, and this worked a bit for me:

tracepoint:io_uring:io_uring_create {
  printf("Tracing new uring %d\n", pid);
  @traced[pid] = 1;
}

tracepoint:io_uring:* /@traced[pid]/ { printf("%s\n", probe); }

tracepoint:io_uring:io_uring_task_add /@traced[pid]/ {
  printf("%d adds new task: %p\n", tid, args->ctx);
}

kretprobe:create_io_thread / @traced[pid] / {
  $task = (struct task_struct *) retval;
  printf("%s: %d:%d create_io_thread: %d\n", comm, pid, tid, $task->pid);
  @traced[$task->pid] = 1;
}

kprobe:vfs* / @traced[pid] / {
  printf("%s: %s\n", comm, probe);
}

kprobe:vfs_open / @traced[pid] / {
  printf("%s: open(%s)\n", comm, str(((struct path *)arg0)->dentry->d_name.name));
}

kprobe:__fsnotify_parent / @traced[pid] / {
  $dentry = (struct dentry *) arg0;
  printf("%s: __fsnotify_parent(%s)\n", comm, str($dentry->d_name.name));
}

For me, @patricoferris's test case gives:

Attaching 98 probes...
Tracing new uring 12889
tracepoint:io_uring:io_uring_create
tracepoint:io_uring:io_uring_register
tracepoint:io_uring:io_uring_register
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_poll_arm
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_queue_async_work
main.exe: 12889:12889 create_io_thread: 12900
tracepoint:io_uring:io_uring_cqring_wait
iou-wrk-12889: kprobe:vfs_open
iou-wrk-12889: open(test3.data)
iou-wrk-12889: __fsnotify_parent(test3.data)
iou-wrk-12889: __fsnotify_parent(test3.data)
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_queue_async_work
tracepoint:io_uring:io_uring_cqring_wait
iou-wrk-12889: __fsnotify_parent(test3.data)
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_queue_async_work
tracepoint:io_uring:io_uring_cqring_wait
iou-wrk-12889: __fsnotify_parent(test3.data)
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
main.exe: __fsnotify_parent(test3.data)
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_task_add
12889 adds new task: 0xffff90cbe90e7800
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
main.exe: __fsnotify_parent(main.exe)
main.exe: __fsnotify_parent(ld-linux-x86-64.so.2)
main.exe: __fsnotify_parent(libm.so.6)
main.exe: __fsnotify_parent(libc.so.6)

Some combination of these might help track it down!

patricoferris commented 2 months ago

Quick update that @talex5's trace file gives me:

Attaching 97 probes...
Tracing new uring 1203664
tracepoint:io_uring:io_uring_create
tracepoint:io_uring:io_uring_register
tracepoint:io_uring:io_uring_register
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_poll_arm
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_queue_async_work
test.exe: 1203664:1203664 create_io_thread: 1203665
tracepoint:io_uring:io_uring_cqring_wait
iou-wrk-1203664: kprobe:vfs_open
iou-wrk-1203664: open(test3.data)
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_queue_async_work
tracepoint:io_uring:io_uring_cqring_wait
tracepoint:io_uring:io_uring_complete
tracepoint:io_uring:io_uring_task_work_run
tracepoint:io_uring:io_uring_submit_req
tracepoint:io_uring:io_uring_file_get
tracepoint:io_uring:io_uring_queue_async_work
tracepoint:io_uring:io_uring_cqring_wait

so presumably iou-wrk-12889: __fsnotify_parent(test3.data) is not arriving... or something

talex5 commented 2 months ago

OK, I installed Ubuntu 22.04.4 and set up a zfs partition and got the same problem. The Eio tests pass on ext4, but hang on zfs.

test.exe is using 100% CPU (I was trying to run apt at the same time and that was taking ages due to this too):

user@ubuntu:~$ pidstat 1
Linux 5.15.0-102-generic (ubuntu)   18/04/24    _x86_64_    (1 CPU)

10:50:36      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:50:37     1000     26476    0.00   97.03    0.00    0.00   97.03     0  test.exe
10:50:37        0     27178    1.98    0.00    0.00  113.86    1.98     0  apt-check

The process itself is just waiting for uring_enter to return:

user@ubuntu:/proc/26476$ sudo cat stack 
[<0>] io_cqring_wait+0x374/0x430
[<0>] __do_sys_io_uring_enter+0x1f6/0x540
[<0>] __x64_sys_io_uring_enter+0x22/0x30
[<0>] do_syscall_64+0x5c/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x62/0xcc

The CPU use is coming from a uring worker thread:

user@ubuntu:/proc/26476$ pidstat -t 1
10:51:43      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:51:44     1000     26476         -    0.00   98.00    0.00    0.00   98.00     0  test.exe
10:51:44     1000         -     26477    0.00   99.00    0.00    2.00   99.00     0  |__iou-wrk-26476

The worker is doing various things:

user@ubuntu:/proc/26477$ sudo cat stack 
[<0>] dsl_dir_tempreserve_impl.constprop.0+0x129/0x370 [zfs]
[<0>] dsl_dir_tempreserve_space+0xed/0x150 [zfs]
[<0>] dmu_tx_try_assign+0x266/0x2f0 [zfs]
[<0>] dmu_tx_assign+0x54/0x80 [zfs]
[<0>] zfs_write+0x45c/0xdb0 [zfs]
[<0>] zpl_iter_write+0xe7/0x130 [zfs]
[<0>] io_write+0x109/0x330
[<0>] io_issue_sqe+0x338/0x13f0
[<0>] io_wq_submit_work+0x8d/0xf0
[<0>] io_worker_handle_work+0x100/0x2b0
[<0>] io_wqe_worker+0x2c3/0x320
[<0>] ret_from_fork+0x22/0x30

perf gives some samples:

   - zpl_iter_write                                                                                                             ▒
      - 97.27% zfs_write                                                                                                        ▒
         + 26.55% dmu_tx_assign                                                                                                 ▒
         + 25.26% dmu_tx_commit                                                                                                 ▒
         + 19.62% dmu_write_uio_dbuf                                                                                            ▒
         + 11.91% dmu_tx_hold_write_by_dnode                                                                                    ▒
         + 5.27% dmu_tx_create                                                                                                  ▒
         + 3.45% dmu_tx_hold_sa                                                                                                 ▒
           0.87% zfs_clear_setid_bits_if_necessary                                                                              ▒
           0.59% zfs_id_overblockquota                                                                                          ▒

After rebooting the VM, I tried with 14ae3cfee3 (i.e. without the symlink commits) and that hangs too, so I don't think it needs to block this PR. This is a ZFS (or Linux) bug.

avsm commented 2 months ago

I can reproduce this (only) under a ZFS partition using Linux 6.8 too:

Linux comanche 6.8.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 22:30:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Works fine on a ext4 partition with the same kernel. We should extract a standalone test case here for reporting upstream.