ut-osa / assise

GNU General Public License v2.0
57 stars 30 forks source link

Removing files doesn't work #7

Open zvikfir opened 3 years ago

zvikfir commented 3 years ago

Hi,

I have come across an issue where files can't be deleted from the /mlfs directory.

Here is an example of performing ls, followed by rm, and then ls again.

# ~kfirzv/bin/run_with_assise.sh ls -l /mlfs/

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34257
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34257, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34257, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
[Local-Client] Creating connection (pid:34257, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
In thread
SEND --> MSG_INIT [pid 2|34257]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 2
SEND --> MSG_INIT [pid 1|34257]
SEND --> MSG_INIT [pid 0|34257]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 1
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 0
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072
total 9216
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_0
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_1
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_2
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_3
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_4
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_5
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_6
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_7
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_8

# -----------------------------

# ~kfirzv/bin/run_with_assise.sh rm -rf /mlfs/mpi_hello_*

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34287
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34287, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34287, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
In thread
[Local-Client] Creating connection (pid:34287, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
SEND --> MSG_INIT [pid 1|34287]
SEND --> MSG_INIT [pid 0|34287]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 1
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 0
SEND --> MSG_INIT [pid 2|34287]
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 2
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072

# --------------------------

# ~kfirzv/bin/run_with_assise.sh ls -l /mlfs/

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34306
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34306, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34306, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
[Local-Client] Creating connection (pid:34306, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
In thread
SEND --> MSG_INIT [pid 2|34306]
SEND --> MSG_INIT [pid 0|34306]
SEND --> MSG_INIT [pid 1|34306]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0xac22a0
start shmem_poll_loop for sockfd 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0xac22a0
start shmem_poll_loop for sockfd 2
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0xac22a0
start shmem_poll_loop for sockfd 1
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072
total 9216
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_0
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_1
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_2
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_3
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_4
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_5
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_6
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_7
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_8

In addition, there seems to be files which are not deleted even after re-allocating the NVRAM between app-direct and memory-mode, re-creating the namespace, and after re-performing mkfs. For instance, performing rm -rf /mlfs/* gives an error that some of these files can't be deleted even though they shouldn't exist anymore.

Is there any way to clean the cache of Assise?

Thanks, Kfir

wreda commented 3 years ago

The rm command is using unsupported syscalls, which is why it isn't doing anything. You may want to write a custom script that searches the desired directory (see libfs/tests/statdir_test.c) and then calls unlink.

The mkfs.sh script should wipe all filesystem data and metadata. If it isn't working as expected, I'd double-check that your binaries are up-to-date by cleaning and re-compiling LibFS.