Fail to load AmpliconArquitech files

alexmascension commented 8 months ago

Description of the bug

Hi!

I'm running the pipeline with the following pipeline:

nextflow run nf-core/circdna \
-r 1.0.4 \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 21.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_mouse/samplesheets/CIRCDNA.csv \
--outdir results/test_mouse/CIRCDNA \
--genome GRCm38 \
--reference_build mm10 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCm38/genome.fasta \
--aa_data_repo database/indexes/GRCm38/aa_data_repo

In the process NFCORE_CIRCDNA:CIRCDNA:AMPLICONCLASSIFIER_AMPLICONSIMILARITY I get the following error:

Command executed:

  REF=mm10
  export AA_DATA_REPO=/data/Proyectos/NGS_pipeline/database/indexes/GRCm38/aa_data_repo
  export AA_SRC=/home/nanoneuro/.nextflow/assets/nf-core/circdna/bin

  amplicon_similarity.py \
      --ref $REF \
       \
      --input ampliconclassifier.input

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:AMPLICONCLASSIFIER_AMPLICONSIMILARITY":
      AmpliconClassifier: $(echo $(amplicon_classifier.py --version | sed 's/amplicon_classifier //g' | sed 's/ .*//g'))
  END_VERSIONS

Command exit status:
  1

Command output:
  Required classifications set to
  set()

Command error:
  Required classifications set to
  set()
  Traceback (most recent call last):
    File "/usr/local/bin/amplicon_similarity.py", line 456, in <module>
      lcD, cg5D = set_lcd(AA_DATA_REPO, args.no_LC_filter)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/bin/amplicon_similarity.py", line 360, in set_lcd
      with open(AA_DATA_REPO + "file_list.txt") as infile:
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  FileNotFoundError: [Errno 2] No such file or directory: 'database/indexes/GRCm38/aa_data_repo/mm10/file_list.txt'

Work dir:
  /data/Proyectos/NGS_pipeline/work/26/406c0d8f5f873a353686003b4eaaee

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

At first I though it could be a problem due to being a relative path or so, but using

--aa_data_repo /data/Proyectos/NGS_pipeline/database/indexes/GRCm38/aa_data_repo

Yields the same error but with the full path.

Command used and terminal output

No response

Relevant files

No response

System information

No response

jluebeck commented 8 months ago

Hi Alex,

Yes, please use a full path (not relative).

What are the contents of your data repo directory /data/Proyectos/NGS_pipeline/database/indexes/GRCm38/aa_data_repo ?

If the data repo files haven't been downloaded into that directory (presumably for GRCm38), then this may cause a bug. You can download the relevant ones here if not obtained already. Also possible that there is a bug with using GRCm38 and the nextflow version of AmpliconSuite.

Thanks, Jens

alexmascension commented 8 months ago

Hi! My list of files is this:

database/indexes/GRCm38/aa_data_repo/mm10:
annotations       file_list.txt     mm10-blacklist.v2.bed         mm10_conserved_gain5.bed                mm10.fa.amb  mm10.fa.fai  mm10.Hardison.Excludable.full.bed             mm10_noAlt.fa.fai
cancer            file_sources.txt  mm10_centromere.bed           mm10_conserved_gain5_onco_subtract.bed  mm10.fa.ann  mm10.fa.pac  mm10_k35.mappability.bedgraph                 onco_bed.bed
dummy_ploidy.vcf  last_updated.txt  mm10_cnvkit_filtered_ref.cnn  mm10.fa                                 mm10.fa.bwt  mm10.fa.sa   mm10_merged_centromeres_conserved_sorted.bed

database/indexes/GRCm38/aa_data_repo/mm10/annotations:
gencode.vM10.basic.annotation_genes.gff  mm10GenomicSuperDup.tab

database/indexes/GRCm38/aa_data_repo/mm10/cancer:
oncogene_list.txt  oncogenes

database/indexes/GRCm38/aa_data_repo/mm10/cancer/oncogenes:
AC_oncogene_set_mm10.gff  mm10_consensus_oncogenes_list_from_hg19.gff

I downloaded it from the repository you mentioned.

The most strange thing is that the file /data/Proyectos/NGS_pipeline/database/indexes/GRCh38/aa_data_repo/mm10/file_sources.txt does exist.

jluebeck commented 8 months ago

Thanks Alex,

This is very strange - can you try the 1.0.5 beta version of circdna?

If you need a hold-over solution until this is resolved, there are Docker & Singularity images for AmpliconSuite available here.

jluebeck commented 8 months ago

If you are able to download this small hg19-aligned BAM file from SRA it would be a good test of your installation as well (provided you download the appropriate data repo for this sample as well, e.g. create /data/Proyectos/NGS_pipeline/database/indexes/GRCh38/aa_data_repo/hg19/).

alexmascension commented 8 months ago

Hi! Now it seems to work, but I get the following error:

ERROR ~ Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:AMPLICONSUITE (CDNA_3)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:AMPLICONSUITE (CDNA_3)` terminated with an error exit status (1)

Command executed:

  export AA_DATA_REPO=$(echo aa_data_repo)
  export MOSEKLM_LICENSE_FILE=$(echo others)
  # Define Variables AA_SRC and AC_SRC
  export AA_SRC=$(dirname $(python -c "import ampliconarchitectlib; print(ampliconarchitectlib.__file__)"))
  export AC_SRC=$(dirname $(which amplicon_classifier.py))
  REF=GRCh38

  AmpliconSuite-pipeline.py \
       \
      -s CDNA_3 \
      -t 2 \
      --bam CDNA_3.md.bam \
      --ref GRCh38 \
      --run_AA --run_AC \

  # Move Files to base work directory
  find CDNA_3_cnvkit_output/ -type f -print0 | xargs -0 mv -t ./
  find CDNA_3_AA_results/ -type f -print0 | xargs -0 mv -t ./
  find CDNA_3_classification/ -type f -print0 | xargs -0 mv -t ./

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:AMPLICONSUITE":
      AmpliconSuite-pipeline.py: $(AmpliconSuite-pipeline.py --version | sed 's/AmpliconSuite-pipeline version //')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Running AmpliconSuite-pipeline on sample: CDNA_3
  CDNA_3.md.bam index not found, calling samtools index
  Finished indexing

  CDNA_3.md.bam: 3371866 + 0 properly paired (86.08% : N/A)
  WARNING: BAM FILE PROPERLY PAIRED RATE IS BELOW 95%.
  Quality of data may be insufficient for AA analysis. Poorly controlled insert size distribution during sample prep can cause high fractions of read pairs to be marked as discordant during alignment. Artifactual short SVs and long runtimes may occur!

  Running CNVKit batch
  python3 /opt/conda/bin/cnvkit.py batch -m wgs -r aa_data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 2 -d CDNA_3_cnvkit_output/ CDNA_3.md.bam
  Matplotlib created a temporary cache directory at /tmp/matplotlib-4hlj9weh because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Fontconfig error: No writable cache directories
  CNVkit 0.9.10
  Wrote CDNA_3_cnvkit_output/GRCh38_cnvkit_filtered_ref.target-tmp.bed with 568558 regions
  Wrote CDNA_3_cnvkit_output/GRCh38_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions
  Running 1 samples in 2 processes (that's 2 processes per bam)
  Running the CNVkit pipeline on CDNA_3.md.bam ...
  Processing reads in CDNA_3.md.bam

  Running CNVKit segment
  python3 /opt/conda/bin/cnvkit.py segment CDNA_3_cnvkit_output/CDNA_3.md.cnr  -p 2 -m cbs -o CDNA_3_cnvkit_output/CDNA_3.md.cns
  Matplotlib created a temporary cache directory at /tmp/matplotlib-4xkvob46 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Fontconfig error: No writable cache directories
  Traceback (most recent call last):
    File "/opt/conda/bin/cnvkit.py", line 10, in <module>
      sys.exit(main())
    File "/opt/conda/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
      args.func(args)
    File "/opt/conda/lib/python3.10/site-packages/cnvlib/commands.py", line 986, in _cmd_segment
      cnarr = read_cna(args.filename)
    File "/opt/conda/lib/python3.10/site-packages/cnvlib/cmdutil.py", line 12, in read_cna
      return tabio.read(infile, into=CNA, sample_id=sample_id, meta=meta)
    File "/opt/conda/lib/python3.10/site-packages/skgenome/tabio/__init__.py", line 75, in read
      dframe = reader(infile, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/skgenome/tabio/tab.py", line 17, in read_tab
      dframe = pd.read_csv(infile, sep="\t", dtype={"chromosome": "str"})
    File "/opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
      return _read(filepath_or_buffer, kwds)
    File "/opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
      parser = TextFileReader(filepath_or_buffer, **kwds)
    File "/opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
      self._engine = self._make_engine(f, self.engine)
    File "/opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
      self.handles = get_handle(
    File "/opt/conda/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle
      handle = open(
  FileNotFoundError: [Errno 2] No such file or directory: 'CDNA_3_cnvkit_output/CDNA_3.md.cnr'
  CNVKit encountered a non-zero exit status. Exiting...

Work dir:
  /data/Proyectos/NGS_pipeline/work/4c/72c545492771caf5e1c531da46300c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

I'm not sure if I need to add any parameter to the config of nf-core. Thanks!

nexflow.log

jluebeck commented 8 months ago

Unfortunately, I am not able to reproduce this issue locally when I test the latest version of the tool on a GRCh38-aligned sample.

If you do docker image ls, what version of the prepareaa docker do you have?

Did it work for your mm10 sample?

nf-core / circdna