nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
214 stars 58 forks source link

Can't do quality control when change reference #1519

Open YangJingqii opened 1 week ago

YangJingqii commented 1 week ago

Dear,

I'm running nextclade when analysing RSV, but I want to change reference, so I run

nextclade-x86_64-unknown-linux-gnu run \
-r ../data/general_data/ref/reference_seq.fasta \
-m ../data/general_data/ref/reference_seq.gff3 \
-O ../data/raw_consensus/nextclade_rsva \
--include-reference true \
--include-nearest-node-info true \
../data/raw_consensus/RSVA.fasta 

But the qc status in the output tsv are all good,while some sequences are bad in the nextstrain metadata ,I wonder how to get the real qc status while I change the reference?

I'm very much looking forward to your answer. Thank you so much!

ivan-aksamentov commented 1 week ago

Hi @YangJingqii,

QC configuration can be provided in pathogen.json file under qc field (docs are here, but slightly outdated). You pass the file path to the --input-pathogen-json (-p) CLI argument.

While the format is not well-documented, you can check examples of known working pathogen.json in the existing datasets: in our data repo or by downloading them through CLI - they are the same. For example the current QC config for nextstrain/rsv/a/EPI_ISL_412866 dataset is here.