nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.76k stars 630 forks source link

Improve explanation of cache DB IOException #303

Closed biologghe closed 7 years ago

biologghe commented 7 years ago

When an IOException is thrown due to a file locking issue in the cache DB folder, it currently looks like this:

java.io.IOException: Can't create cache DB: /mnt/lfs_dev/home/mlogghe/working_dir/NGS_Gerald/ngsnanoxtractor/.nextflow/cache/f5df41a1-a9f8-4fd5-9b01-b2ae6639e650/db                        
        at nextflow.CacheDB.openDb(CacheDB.groovy:103)                                                                                                                                      
        at nextflow.CacheDB.open(CacheDB.groovy:113)                                                                                                                                        
        at nextflow.Session.init(Session.groovy:290)                                                                                                                                        
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:148)                                                                                                                    
        at nextflow.cli.CmdRun.run(CmdRun.groovy:207)                                                                                                                                       
        at nextflow.cli.Launcher.run(Launcher.groovy:406)                                                                                                                                   
        at nextflow.cli.Launcher.main(Launcher.groovy:554) 

It might improve debugging if the exception indicates it has to do with file locking.

Thanks Paolo for the help.

pditommaso commented 7 years ago

Unfortunately the stack trace does not container the raising exception. Would you be able to replicate the error if I upload an updated version ?

biologghe commented 7 years ago

yes, I'll check with our cluster admin. It's a matter of remounting the fs. Currently a workflow is still running. I'll be back asap.

pditommaso commented 7 years ago

Nice in that case, please download a new snapshot with:

NXF_VER=0.24.0-SNAPSHOT CAPSULE_RESET=1  nextflow info

Then


NXF_VER=0.24.0-SNAPSHOT nextflow run .. etc
biologghe commented 7 years ago

Hi Paolo, This is the content of .nextflow.log when lustre is mounted without the -o flock option:

Mar-14 13:43:34.542 [main] DEBUG nextflow.cli.CmdRun -                                                                                                                                                   
  Version: 0.24.0-SNAPSHOT build 4204                                                                                                                                                                    
  Modified: 13-03-2017 16:05 UTC (17:05 CEST)                                                                                                                                                            
  System: Linux 3.10.0-327.3.1.el7.x86_64                                                                                                                                                                
  Runtime: Groovy 2.4.9 on OpenJDK 64-Bit Server VM 1.8.0_91-b14                                                                                                                                         
  Encoding: UTF-8 (UTF-8)                                                                                                                                                                                
  Process: 126426@disc-xeon-1 [x.x.x.x]                                                                                                                                               
  CPUs: 48 - Mem: 125.7 GB (118.2 GB) - Swap: 4 GB (4 GB)                                                                                                                                                
Mar-14 13:43:34.592 [main] DEBUG nextflow.Session - Work-dir: /mnt/lfs_dev/home/mlogghe/working_dir/NGS_Gerald/ngsnanoxtractor/work [lustre]                                                             
Mar-14 13:43:34.706 [main] ERROR nextflow.cli.Launcher - Can't create cache DB: /mnt/lfs_dev/home/mlogghe/working_dir/NGS_Gerald/ngsnanoxtractor/.nextflow/cache/c633f115-82b7-4f6d-9bbb-d7a032908e5f/db 
pditommaso commented 7 years ago

Even less informative, sorry this was my fault :(

If it doesn't bother you, could you refresh your snapshot and relaunch it as before. I mean:

NXF_VER=0.24.0-SNAPSHOT CAPSULE_RESET=1  nextflow info

Then

NXF_VER=0.24.0-SNAPSHOT nextflow run .. etc
biologghe commented 7 years ago

uhm, sorry Paolo. It looks like this is not an issue after all. I have now run this as root and this Can't create cache DB exception is not thrown at all. Guess it had to do with my .m2 in my home folder (also on lustre). Root's .m2 is not on lustre and therefor all fresh deps could be retrieved. As it seems the up to date code has no issues related to lustre file locking (or lack of).

pditommaso commented 7 years ago

OK, not a big problem. Thanks.

funnell commented 7 years ago

I have this problem too, with version 24.3, on a lustre filesystem. There is no .m2 directory in my home directory. I've had file lock issues with Ruffus too in the past. Their approach was to allow the db to be stored in a different location, I suppose on a filesystem that didn't have issues with file locks.

pditommaso commented 7 years ago

NF creates both the cache DB and the pipeline work directory in the current execution folder. The DB cache cannot be relocated and requires a file system supporting file locks. The pipeline work directory must be a shared file system (provided your are using a grid scheduler execution) and you can relocate to a path different from the current one with the -w command line option.

Thus you can execute NF from a local path a specify a shared work directory by using the -w option.

(the directory .m2 is completely unrelated)