tonyyxliu / CUHKSZ-CSC4005

Project Materials for CUHK(SZ) Course CSC4005/MDS6108: Parallel Programming
MIT License
57 stars 21 forks source link

Slow Performance of 'rm' Command in Cluster Environment #65

Open chenshi3 opened 3 days ago

chenshi3 commented 3 days ago

When 'rm -r build' command operating in our cluster environment, the deletion process is taking considerably longer than expected (10 minutes or even longer).

tonyyxliu commented 2 days ago

The compilation process is also slow. I guess it is NFS's issue, and we will try to fix it ASAP.

EnderturtleOrz commented 2 days ago

I used nfsstat and found that

  1. open_noat (43%)
  2. lock (24%)
  3. locku (24%)

However, I checked iostat and found that the I/O load was not high. A possible reason is that we have too many vscode-server threads, which could lock many files.

chenshi3 commented 2 days ago

I'm unable to start VSCode on the cluster, so I'm using SFTP to update code.

chenshi3 commented 1 day ago

I used nfsstat and found that

  1. open_noat (43%)
  2. lock (24%)
  3. locku (24%)

However, I checked iostat and found that the I/O load was not high. A possible reason is that we have too many vscode-server threads, which could lock many files.

Still super slow, for both compiling and using Vim.

EnderturtleOrz commented 1 day ago

Here are some experimental facts that may be useful for tracing bugs.

  1. Test results on reading and writing using dd on different machines The client machine (10.26.200.1)

    [121090184@node01 ~]$ dd if=/dev/zero of=~/testfile bs=1M count=512 oflag=direct
    512+0 records in
    512+0 records out
    536870912 bytes (537 MB) copied, 7.33619 s, 73.2 MB/s
    [121090184@node01 ~]$ dd if=~/testfile of=/dev/null bs=1M count=512 iflag=direct
    512+0 records in
    512+0 records out
    536870912 bytes (537 MB) copied, 5.07156 s, 106 MB/s

    Another cluster machine but not for course (10.26.200.13)

    $ dd if=~/testfile of=/dev/null bs=1M count=512 iflag=direct
    512+0 records in
    512+0 records out
    536870912 bytes (537 MB) copied, 1.04301 s, 515 MB/s
    $ dd if=/dev/zero of=~/testfile bs=1M count=512 oflag=direct
    512+0 records in
    512+0 records out
    536870912 bytes (537 MB) copied, 1.20259 s, 446 MB/s

    They show different speeds.

  2. Network latency It is ok (<1ms) to connect to NFS server.

  3. Client communication

    op/s     rpc bklog
    107.39    0.00
    read:            ops/s         kB/s       kB/op     retrans     avg RTT (ms)    avg exe (ms)
          0.330  10.835  32.861       0 (0.0%)    2.309  26.689
    write:           ops/s         kB/s       kB/op     retrans     avg RTT (ms)    avg exe (ms)
          0.450  77.069 171.390       0 (0.0%)    5.744 1455.707

    avg exe takes 1455ms which is large for writing.

  4. I/O performance on client

    
    [121090184@node01 ~]$ iostat -x 1
    Linux 3.10.0-862.el7.x86_64 (node01)    09/18/2024  _x86_64_    (40 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle 1.82 0.00 0.33 0.01 0.00 97.84

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.26 0.01 7.72 0.67 339.25 87.93 0.03 3.28 0.45 3.29 0.11 0.09 dm-0 0.00 0.00 0.01 6.96 0.67 339.25 97.49 0.03 3.70 0.46 3.70 0.13 0.09 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.18 0.00 47.80 0.22 54.49 2.49 0.00

Showing that I/O on clients seems to work well.

5. Log traced on `rm build` (`/nfsmnt/121090184/CUHKSZ-CSC4005/project1/rm_trace.log`)
* line 197 *

30229 unlinkat(8, "cmake_clean.cmake", 0) = 0 30229 unlinkat(8, "build.make", 0) = 0


`unlinkat(8,"build.make",0)` took about 10 min to finish.

I suggest checking on the NFS server for details.