Closed biologghe closed 7 years ago
Unfortunately the stack trace does not container the raising exception. Would you be able to replicate the error if I upload an updated version ?
yes, I'll check with our cluster admin. It's a matter of remounting the fs. Currently a workflow is still running. I'll be back asap.
Nice in that case, please download a new snapshot with:
NXF_VER=0.24.0-SNAPSHOT CAPSULE_RESET=1 nextflow info
Then
NXF_VER=0.24.0-SNAPSHOT nextflow run .. etc
Hi Paolo,
This is the content of .nextflow.log when lustre is mounted without the -o flock
option:
Mar-14 13:43:34.542 [main] DEBUG nextflow.cli.CmdRun -
Version: 0.24.0-SNAPSHOT build 4204
Modified: 13-03-2017 16:05 UTC (17:05 CEST)
System: Linux 3.10.0-327.3.1.el7.x86_64
Runtime: Groovy 2.4.9 on OpenJDK 64-Bit Server VM 1.8.0_91-b14
Encoding: UTF-8 (UTF-8)
Process: 126426@disc-xeon-1 [x.x.x.x]
CPUs: 48 - Mem: 125.7 GB (118.2 GB) - Swap: 4 GB (4 GB)
Mar-14 13:43:34.592 [main] DEBUG nextflow.Session - Work-dir: /mnt/lfs_dev/home/mlogghe/working_dir/NGS_Gerald/ngsnanoxtractor/work [lustre]
Mar-14 13:43:34.706 [main] ERROR nextflow.cli.Launcher - Can't create cache DB: /mnt/lfs_dev/home/mlogghe/working_dir/NGS_Gerald/ngsnanoxtractor/.nextflow/cache/c633f115-82b7-4f6d-9bbb-d7a032908e5f/db
Even less informative, sorry this was my fault :(
If it doesn't bother you, could you refresh your snapshot and relaunch it as before. I mean:
NXF_VER=0.24.0-SNAPSHOT CAPSULE_RESET=1 nextflow info
Then
NXF_VER=0.24.0-SNAPSHOT nextflow run .. etc
uhm, sorry Paolo. It looks like this is not an issue after all. I have now run this as root and this Can't create cache DB exception is not thrown at all. Guess it had to do with my .m2 in my home folder (also on lustre). Root's .m2 is not on lustre and therefor all fresh deps could be retrieved. As it seems the up to date code has no issues related to lustre file locking (or lack of).
OK, not a big problem. Thanks.
I have this problem too, with version 24.3, on a lustre filesystem. There is no .m2 directory in my home directory. I've had file lock issues with Ruffus too in the past. Their approach was to allow the db to be stored in a different location, I suppose on a filesystem that didn't have issues with file locks.
NF creates both the cache DB and the pipeline work directory in the current execution folder. The DB cache cannot be relocated and requires a file system supporting file locks. The pipeline work directory must be a shared file system (provided your are using a grid scheduler execution) and you can relocate to a path different from the current one with the -w
command line option.
Thus you can execute NF from a local path a specify a shared work directory by using the -w
option.
(the directory .m2
is completely unrelated)
When an IOException is thrown due to a file locking issue in the cache DB folder, it currently looks like this:
It might improve debugging if the exception indicates it has to do with file locking.
Thanks Paolo for the help.