Database
Sourmash supports several types of databases.
For taxprofiler, I propose to use a single ZIP file containing signatures, but the database also requires a CSV file with taxonomic information (gzip-compessed csv is also supported). So the tar file with database should contain two files:
As a first step, sourmash creates FracMinHash sketches (signatures) for each sample.
This step is independent of the database, so we need to do sketching only once. Therefore, I removed the database from the input channel (ch_input_for_profiling.sourmash.map). Otherwise, it will perform independent sketching for each database provided and we will have lots of duplicated samples, isn't it?
Sourmash can create 4 types of signatures: DNA, protein, protein translated from DNA, and signatures based on CSV file with locations to genomes/proteomes.
The sourmash/sketch module is written to support all these input types. Therefore, it is required to pass extra args to the process. The esieast way is to specify it in the config, e.g.:
This PR adds
sourmash
as an additional profiler. Related to https://github.com/nf-core/taxprofiler/issues/112NOTES:
Database Sourmash supports several types of databases. For taxprofiler, I propose to use a single ZIP file containing signatures, but the database also requires a CSV file with taxonomic information (gzip-compessed csv is also supported). So the tar file with database should contain two files:
For now, file names are hardcoded .
As a first step, sourmash creates FracMinHash sketches (signatures) for each sample. This step is independent of the database, so we need to do sketching only once. Therefore, I removed the database from the input channel (
ch_input_for_profiling.sourmash.map
). Otherwise, it will perform independent sketching for each database provided and we will have lots of duplicated samples, isn't it?Sourmash can create 4 types of signatures: DNA, protein, protein translated from DNA, and signatures based on CSV file with locations to genomes/proteomes. The
sourmash/sketch
module is written to support all these input types. Therefore, it is required to pass extra args to the process. The esieast way is to specify it in the config, e.g.:, where the first word in
ext.args
should bedna
,protein
,translate
, orfromfile
.TO DO list
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).