Nextflow DSL2 pipeline to generate a Genome Note, including assembly statistics, quality metrics, and Hi-C contact maps. This workflow is part of the Tree of Life production suite.
On the public_dev branch, the input fasta is called GCA_946965045.1.fasta.gz and the Hi-C CRAM file GCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.cram, but the assembly parameter is set to GCA_946965045.2. After a run of the test profile, 1) I get these two files in the genome_note/ directory:
GCA_946965045.1.csv
GCA_946965045.2.docx
2), GCA_946965045.1.csv contains:
Accession,GCA_946965045.1
3), and many more intermediate files are also named GCA_946965045.1.*, indicating that the pipeline is confused about what is meta.id.
The input file names can be different from the accession number etc, but I'd expect the outputs of the pipeline to be consistently named.
Command used and terminal output
nextflow run sanger-tol/genomenote/ -profile test,singularity -r public_dev
Relevant files
No response
System information
Nextflow 23.04.1-5866 from our central installation
Description of the bug
On the public_dev branch, the input fasta is called
GCA_946965045.1.fasta.gz
and the Hi-C CRAM fileGCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.cram
, but the assembly parameter is set toGCA_946965045.2
. After a run of thetest
profile, 1) I get these two files in thegenome_note/
directory:GCA_946965045.1.csv
GCA_946965045.2.docx
2),
GCA_946965045.1.csv
contains:3), and many more intermediate files are also named
GCA_946965045.1.*
, indicating that the pipeline is confused about what ismeta.id
.The input file names can be different from the accession number etc, but I'd expect the outputs of the pipeline to be consistently named.
Command used and terminal output
Relevant files
No response
System information
Nextflow 23.04.1-5866 from our central installation