nf-core / circdna

Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.
https://nf-co.re/circdna
MIT License
25 stars 14 forks source link

CNVKIT_BATCH fails #39

Closed paulemnah closed 1 year ago

paulemnah commented 1 year ago

Description of the bug

The pipeline runs through without problems (on the HPC cluster), but then at the cnvkit step it crashes, saying the file 'GRCh37_cnvkit_filtered_ref.cnn' doesn't exist. It does exist though.

Please assist as to what might be causing this issue. Thanks a lot!

Command used and terminal output

nextflow run nf-core/circdna -work-dir /path/to/wdir --outdir /path/to/results --genome GRCh37 -profile singularity --circle_identifier ampliconarchitect --aa_data_repo /path/to/nfcore_circdna --mosek_license_dir /path/to/nfcore_circdna/mosek --reference_build GRCh37 --max_cpus 64 --max_memory 200.GB --max_time 500.h -c /path/to/circdna.config -with-timeline /path/to/wdir/timeline.html --input path/to/samplesheet.csv

Output:

Command error:
 WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
 CNVkit 0.9.9
 Traceback (most recent call last):
   File "/usr/local/bin/cnvkit.py", line 9, in <module>
     args.func(args)
   File "/usr/local/lib/python3.9/site-packages/cnvlib/commands.py", line 118, in _cmd_batch
     ref_arr = read_cna(args.reference)
   File "/usr/local/lib/python3.9/site-packages/cnvlib/cmdutil.py", line 12, in read_cna
     return tabio.read(infile, into=CNA, sample_id=sample_id, meta=meta)
   File "/usr/local/lib/python3.9/site-packages/skgenome/tabio/__init__.py", line 74, in read
     dframe = reader(infile, **kwargs)
   File "/usr/local/lib/python3.9/site-packages/skgenome/tabio/tab.py", line 17, in read_tab
     dframe = pd.read_csv(infile, sep='\t', dtype={'chromosome': 'str'})
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 610, in read_csv
     return _read(filepath_or_buffer, kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 462, in _read
     parser = TextFileReader(filepath_or_buffer, **kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 819, in __init__
     self._engine = self._make_engine(self.engine)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
     return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1867, in __init__
     self._open_handles(src, kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1362, in _open_handles
     self.handles = get_handle(
   File "/usr/local/lib/python3.9/site-packages/pandas/io/common.py", line 642, in get_handle
     handle = open(
 FileNotFoundError: [Errno 2] No such file or directory: '/path/to/aa_data_repo/GRCh37/GRCh37_cnvkit_filtered_ref.cnn'

Relevant files

No response

System information

nextflow version 21.10.6 on HPC cluster container: singularity ( apptainer version 1.1.3-1.el7) OS: Linux odcf-worker01 3.10.0-1160.76.1.el7.x86_64 nf-core/circdna 1.0.1

DSchreyer commented 1 year ago

Hi, I've noticed this error earlier and it is an issue from the singularity container. I need to implement the cnn file as an input, otherwise singularity will try to find the file which is not saved in the singularity instance.

I'll wrote a short fix and currently test it in dev, would you be willing to test it by pulling the newest version of the dev branch?

paulemnah commented 1 year ago

Hi, thanks for your message! I tried it, but have not managed to get it to work yet.

Firstly (directly at launch of the pipeline), it complains about the variable being unexpected:

WARN: Found unexpected parameters:

Then, at the cnvkit step:

[31mError executing process > 'NFCORE_CIRCDNA:CIRCDNA:CNVKIT_BATCH (culture1)'  Caused by: No such variable: reference -- Check script '/path/to/.nextflow/assets/nf-core/circdna/./workflows/../modules/local/cnvkit/batch/main.nf' at line: 27  Source block: def args = task.ext.args ?: '' def fasta_args = reference ? "" : "--fasta $fasta" def reference_args = reference ? "--reference $cnn" : "" """ cnvkit.py \ batch \ $bam \ $fasta_args \ $reference_args \ --processes $task.cpus \ $args  cat <<-END_VERSIONS > versions.yml "${task.process}": cnvkit: \$(cnvkit.py version | sed -e "s/cnvkit v//g") END_VERSIONS """

DSchreyer commented 1 year ago

Hi, now I updated it again and ran it using singularity. Could you try again? Now it should work.

paulemnah commented 1 year ago

It failed again, this time during the AMPLIFIED_INTERVALS step. The error was that it couldn't open the coverage.stats file. However, I managed to get it to work by putting the entire AA_data_repo into my home directory, so it would be mounted in the singularity container. Thanks a lot for your help and for developing this great tool!