sanger-tol / genomenote

This Nextflow DSL2 pipeline takes aligned HiC reads, creates contact maps and a table of statistics.
https://pipelines.tol.sanger.ac.uk/genomenote
MIT License
19 stars 2 forks source link

Error executing process GENOME_STATISTICS:SUMMARYSEQUENCE with invalid assembly accession #109

Open mtammami opened 3 months ago

mtammami commented 3 months ago

Description of the bug

I encountered an error while executing the SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE process in the sanger-tol/genomenote pipeline. The process fails with an "invalid or unsupported assembly accession" error message when attempting to generate a sequence summary JSON file using the datasets command. This issue arises despite following the pipeline's usage instructions and providing valid input parameters.

-[sanger-tol/genomenote] Pipeline completed with errors-
[0b/de2c7e] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)
[f9/348360] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)
[81/f56f5d] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYGENOME (genome.1)
[9e/883111] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:CONTACT_MAPS:SAMTOOLS_FAIDX (genome.1.fasta)
ERROR ~ Error executing process > 'SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)'

Caused by:
  Process `SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)` terminated with an error exit status (1)

Command executed:

  datasets \
      summary \
      genome \
      accession \
      genome.1 \
      --report sequence \
      > genome.1_sequence.json

  validate_datasets_json.py genome.1_sequence.json

  cat <<-END_VERSIONS > versions.yml
  "SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE":
      ncbi-datasets-cli: $(datasets --version | sed 's/^.*datasets version: //')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: invalid or unsupported assembly accession: genome.1

  Use datasets summary genome accession <command> --help for detailed help about a command.

Work dir:
  /media/la_nube/tools/genomenotes/work/0b/de2c7e9f02030233c72ca802f6d05c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Command used and terminal output

nextflow run sanger-tol/genomenote \
   -profile docker \
   -r 1.1.1 \
   --input samplesheet.csv \
   --fasta genome.1.fasta \
   --outdir genomenote_results \
   --max_cpus 20 \
   --max_memory 200GB \
   --max_time '999h' \
   -resume

Relevant files

No response

System information

Pipeline Version: 1.1.1 Nextflow Version: 23.10.1 Execution Environment: Docker Hardware: Workstation OS: Linux Ubuntu Executor: Local Container engine: Docker, Singularity

muffato commented 3 months ago

Hi @mtammami . Thank you for the bug report. I think the problem is that the v1 of the pipeline doesn't have a parameter for the accession number of the assembly, and assumes the name of the Fasta file is the accession. That will be addressed in the v2, which is on the public_dev branch at the moment.

In the meantime, if you rename the Fasta file, or maybe make a symbolic link with the new name, it may work. Once confirmed, I could push a change to the documentation to clarify that.