sanger-tol / genomenote

Nextflow DSL2 pipeline to generate a Genome Note, including assembly statistics, quality metrics, and Hi-C contact maps. This workflow is part of the Tree of Life production suite.
https://pipelines.tol.sanger.ac.uk/genomenote
MIT License
24 stars 6 forks source link

meta.id confusion #79

Closed muffato closed 1 month ago

muffato commented 1 year ago

Description of the bug

On the public_dev branch, the input fasta is called GCA_946965045.1.fasta.gz and the Hi-C CRAM file GCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.cram, but the assembly parameter is set to GCA_946965045.2. After a run of the test profile, 1) I get these two files in the genome_note/ directory:

2), GCA_946965045.1.csv contains:

Accession,GCA_946965045.1

3), and many more intermediate files are also named GCA_946965045.1.*, indicating that the pipeline is confused about what is meta.id.

The input file names can be different from the accession number etc, but I'd expect the outputs of the pipeline to be consistently named.

Command used and terminal output

nextflow run sanger-tol/genomenote/ -profile test,singularity -r public_dev

Relevant files

No response

System information

Nextflow 23.04.1-5866 from our central installation