Open ericpre opened 1 year ago
It looks like there are some files not created by NOMAD (i.e. by the nomad user, linux uid 1000). When NOMAD is trying to delete the directory it is either not removing the extra files or not allowed to remove the extra files. From the file name, I would assume it is your nfs implementation creating some extra files in the upload folders or the nfs is during operation on the file, or something like this.
Could you check the owning user id and rights on this file '.nfs000000010fa5315e00000001' for us. This might help to find a solution. You have to imagine that NOMAD acts as a nomal user with id 1000 that tries to "rm -r" a directory.
Thanks @markus1978 for the quick reply, please find below the information for this file:
-rw-r--r--. 1 localadmin localadmin 184907 Feb 14 14:52 .nfs000000010fa5315e00000001
The UID of localadmin is 1000.
After restarting the container, the .nfsxxxx...
disappeared and I was able to delete the entry. There are two things which are difference from the "standard" configuration, i.e. following https://nomad-lab.eu/prod/v1/staging/docs/oasis.html#quick-start. There are things which are different here:
I have tried to change the path of the .volumes
folder in the docker-compose.yaml
file to point directly to the folder (and not use the symlink) and there is the same error and a .nfsxxxx
file is created.
I classify this as a bug for now. From what you are saying, NOMAD should been able to delete the file itself and consequently should be able to delete the folder. And even if not, NOMAD should expect these situations, because we want to enable clients to integrate the NOMAD directories into existing storage solutions like you are doing it.
@mohammadnakhaee Can you have a look at this, please. You could experiment with externally created extra "secret" files (starting with .
) in the .volumes
upload folders. It is more likely that this happens with such files in the upload folder or the upload archive folder. Just try if you can reproduce.
For completeness, the full path is:
/oasis-data/.volumes/fs/staging/ZX/ZXm1pqqmQ1eewxe6HCB7vw/archive/.nfs000000010638e2dc00000007
The file name is different because it is from a different upload/delete test. This file seems to be created when attempting to delete the data entry.
It deletes the upload successfully when following these steps:
Could it be that there is something that keep a file open (I can see only a *.msg
file in the archive
folder) while it shouldn't and causes the error?
I could reproduce it by changing the attribute of an extra file sudo chattr +i .test
I have setup a nomad-oasis server with a symbolic link of the
./volumes
folder to a separate folder. It works fine to upload the data and I see the data being created in the right place, but there are errors when deleting the entry from the "your existing upload" section:at first attempt:
at second attempt: