Closed siddharthab closed 2 months ago
The storage volume is needed exactly for the reason to be used a temporary space when using scratch = true
. Setting scratch = false
will cause the task to work directly in the bucket via gcsfuse, resulting in the error you are experiencing.
The scratch = false
is only supported when using Fusion file system see here for details.
By storage volume, I meant GCSFuse. The mounting solution is called "Cloud Storage Volume" in Google Cloud Batch.
The work directory bucket is mounted through GCSFuse already, so I assumed that it is OK for Nextflow to work directly in the mounted directory. And was surprised that it did not work.
I don't see how Fusion and GCSFuse need to be different. They are both Fuse file systems. The documentation for Fusion also says that it enables the work directory to be the mounted cloud directory, foregoing the need for a scratch space.
They are both Fuse file systems
That's the same of saying all cars are equals because have four wheels
I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default:
I wonder if this error is the same as #4845
Potentially it's related. The problem I am seeing is that the symlink is pointing to itself. I don't know if this bug is coming from Nextflow or from GCSFuse.
% gcloud storage ls --full gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh
gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh:
Creation Time: 2024-07-17T18:39:45Z
Update Time: 2024-07-17T18:39:45Z
Storage Class Update Time: 2024-07-17T18:39:45Z
Storage Class: STANDARD
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Additional Properties:
{
"gcsfuse_symlink_target": "/mnt/disks/[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh"
}
Hash (CRC32C): AAAAAA==
Hash (MD5): 1B2M2Y8AsgTpgAmY7PhCfg==
ETag: CMaAmsrcrocDEAE=
Generation: 1721241585483846
Metageneration: 1
ACL: []
TOTAL: 1 objects, 0 bytes (0B)
I tried to look into what scratch means in the context of google-batch. It seems like the stage process simply symlinks files from gcsfuse so that's actually equivalent to scratchless behavior. Upon exit, the unstage process will copy files from current directory to the gcsfuse paths. I suppose then the main difference then is that with scratch enabled, all output files start getting written out at the end of the whole process, whereas with scratch disabled, the output files start getting written out as soon they are closed.
A major difference with Fusion would also be the automatic use of local SSDs for /tmp. And of course, Fusion could be more optimized than gcsfuse.
I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default
@bentsherman Sent #5256 for the error I encountered. I included some commentary as to what it means to have a scratch dir vs not when using Google Batch.
Bug report
Expected behavior and actual behavior
Because Nextflow uses Cloud Storage Volumes by default in Cloud Batch, one could assume that scratch is not needed anymore because Cloud Storage Volumes will take care of staging things in scratch space and then moving to Cloud Storage. However, when I try to set
process.scratch = false
, all my processes fail with messages like:/bin/bash: /mnt/disks/[workdir-bucket]/[workdir-prefix]/[task-id]/.command.sh: Too many levels of symbolic links
Steps to reproduce the problem
nextflow.config:
main.nf (same as in tutorial):
Run with:
Program output
Environment