hpc / charliecloud

Now hosted on GitLab.
https://gitlab.com/charliecloud/main
Apache License 2.0
313 stars 61 forks source link

ch-image gives error: can’t unlink: part_(...).tar.gz #1925

Open nschan opened 2 weeks ago

nschan commented 2 weeks ago

Hello,

I ran into an issue where ch-image pull gives an error whenever I try to run it: error: can’t unlink: part_a91d96a12e98e16c570e2fc88a976f3a26752c3934df068a2e3ee6ffc72b43c2.tar.gz: No such file or directory This seems to be related to my CH_IMAGE_STORAGE, if I set that to different directory it works fine. So, I assume I somehow broke my CH_IMAGE_STORAGE. Is there an easy way to fix this error, other than removing the directory at CH_IMAGE_STORAGE and recreating it?

Edit: It seems that removing the directory is not sufficient, the error occurs even if the directory was removed.

Thanks Niklas

reidpr commented 2 weeks ago

OK that’s interesting.

Can you post a transcript of your terminal session (i.e., commands entered and the output), starting from rm -Rf $CH_IMAGE_STORAGE and continuing through the error? If that’s not appropriate to put in this public bug report, you could e-mail it to me at reidpr@lanl.gov.

One think that catches my eye is the part_ prefix. I didn’t think we did that, though maybe I’m mis-remembering.

nschan commented 2 weeks ago

Turns out I did not actually delete the broken storage directory, but a different storage directory.. My charliecloud storage setup is getting a bit too complex for me it seems. Sorry for the confusion.

Anyway, here is what happened when that directory was set:

$ ch-image pull quay.io/biocontainers/agat:1.1.0--pl5321hdfd78af_1
error: can’t unlink: part_a91d96a12e98e16c570e2fc88a976f3a26752c3934df068a2e3ee6ffc72b43c2.tar.gz: No such file or directory

I tried pulling a couple of different images, with the same error.

Now that I have set $CH_IMAGE_STORAGE to the problematic directory, deleted that, and retried, it seems ok:

$ ch-image pull quay.io/biocontainers/agat:1.1.0--pl5321hdfd78af_1
error: can’t unlink: part_a91d96a12e98e16c570e2fc88a976f3a26752c3934df068a2e3ee6ffc72b43c2.tar.gz: No such file or directory
$ rm -rf $CH_IMAGE_STORAGE
$ ch-image pull quay.io/biocontainers/agat:1.1.0--pl5321hdfd78af_1
initializing storage directory: v7 [...]

Do you know what may have caused this? Would there have been a way to solve this without deleting the storage?

reidpr commented 2 weeks ago

I was wrong, we do use the part_ prefix. I think the failing line of code is probably filesystem.py:1148. This is trying to clean up partially downloaded files, which have the part_ prefix.

As for why it’s failing, I’m not sure. It seems strange because we just got the list of files from glob(). If it happens again, can you keep the bad directory so we can try to debug it?

The short answer is likely yes, we could have cleaned up the storage directory, but I’m not sure exactly what the procedure would be without knowing more about the bad directory.