mycobactopia-org / MTBseq-nf

MTBSeq made simple and easy using Nextflow and nf-core standard.
https://doi.org/10.5281/zenodo.5498063
MIT License
8 stars 1 forks source link

[BUG] cannot create regular file ‘GenomeAnalysisTK.jar’ File exists - conda executor #54

Open Mxrcon opened 2 years ago

Mxrcon commented 2 years ago

Hey, while I've been testing the latest version, i found this error: I'm providing a gatk jar as requested, and using a fresh conda env.

  ENV_PREFIX /data/mariliaconceicao/Davi/mtbseq-nf-master/work/conda/env-9bc281415c98bcad7a1079004336496d
  Processing GenomeAnalysisTK.jar as *.jar
  jar file specified matches expected version
  Copying GenomeAnalysisTK.jar to /data/mariliaconceicao/Davi/mtbseq-nf-master/work/conda/env-9bc281415c98bcad7a1079004336496d/opt/gatk-3.8

Command error:
  cp: cannot create regular file ‘/data/mariliaconceicao/Davi/mtbseq-nf-master/work/conda/env-9bc281415c98bcad7a1079004336496d/opt/gatk-3.8/GenomeAnalysisTK.jar’: File exists

I'll test some solutions and reply here if i found anything new.

Kindly, Davi

Mxrcon commented 2 years ago

removing the file with this command worked:

rm -f ‘/data/mariliaconceicao/Davi/mtbseq-nf-master/work/conda/env-9bc281415c98bcad7a1079004336496d/opt/gatk-3.8/GenomeAnalysisTK.jar
abhi18av commented 2 years ago

Hmm, interesting.

Could you please outline, how exactly I can reproduce this on my end?

abhi18av commented 2 years ago

In any case, the following line gives a hint

cp: cannot create regular file ‘/data/mariliaconceicao/Davi/mtbseq-nf-master/work/conda/env-9bc281415c98bcad7a1079004336496d/opt/gatk-3.8/GenomeAnalysisTK.jar’: File exists

The problem is that gatk-register process, used in each of the MTBSEQ modules, copies the jar to the location where the MTBseq code is installed, which in case of conda is .../work/conda/env...

NOTE: This problem will likely only occur in case of conda since it makes persistent change to the filesystem. When using docker, as soon as the container is shut down, it removes all changes.

There are a few possible solutions

  1. Figure out whether gatk-register command offers a -force (or similar) option to forcefully overwrite the jar in case it exists
  2. Remove the .../work/conda/ENV_FOR_MTBSEQ... to force conda to recreate another env.
  3. Use some bash scripting to gracefully handle this situation for the conda profile as discussed here https://stackoverflow.com/questions/22009364/is-there-a-try-catch-command-in-bash
  4. Introduce a variable called copy_gatk38_jar and modify the scripts to use something like
${params.copy_gatk38_jar ? (gatk-register ...) : ""}

What do you think?

Mxrcon commented 2 years ago

Hmm, interesting.

Could you please outline, how exactly I can reproduce this on my end?

I just cloned the latest version, and then wrote this command: nextflow run main.nf -profile conda -params-file params/params.yml with a fresh copy of the pipeline and some genomes. not sure if the can be reproduced so easily.

abhi18av commented 2 years ago

In that case, I think we can just track this as an issue but in case you run across this one again - please try out the suggestions here https://github.com/mtb-bioinformatics/mtbseq-nf/issues/54#issuecomment-975910260

Mxrcon commented 2 years ago

i think that the best option would be 4, to introduce code to check the situation, in multi threading executions i think that a force script or instruct env removal would generate new issues. I'll work on this solution soon.