nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
219 stars 61 forks source link

Feature Request: More categories for '--output-columns-selection' #1545

Open ammaraziz opened 3 weeks ago

ammaraziz commented 3 weeks ago

This is a feature request to include categories for the --output-columns-selection, similar to the web interface:

image

The categories would be eg general, muts_to_ref, muts_to_node, qc, errors. It would be great if these could be combined like so:

nextclade run \
...
--output-columns-selection general qc

Thanks!

ivan-aksamentov commented 3 weeks ago

Hi @ammaraziz,

I believe this should already be the case.

However, it is difficult to discover currently. One trick is to pass a non-existing value, and then it will print all possible categories and individual columns in the error message:

$ nextclade run -d sars-cov-2 -O out/ --output-columns-selection=does-not-exist

Error: 
   0: Output columns selection: unknown column or category name 'does-not-exist'.

      Possible categories:
          all, general, ref-muts, priv-muts, clade-founder-muts, rel-muts, errs-warns, qc, primers, dynamic

      Possible individual columns:
          index, seqName, clade, qc.overallScore, qc.overallStatus, totalSubstitutions, totalDeletions, totalInsertions, totalFrameShifts, totalMissing, totalNonACGTNs, totalAminoacidSubstitutions, totalAminoacidDeletions, totalAminoacidInsertions, totalUnknownAa, alignmentScore, alignmentStart, alignmentEnd, coverage, isReverseComplement, substitutions, deletions, insertions, frameShifts, aaSubstitutions, aaDeletions, aaInsertions, privateNucMutations.reversionSubstitutions, privateNucMutations.labeledSubstitutions, privateNucMutations.unlabeledSubstitutions, privateNucMutations.totalReversionSubstitutions, privateNucMutations.totalLabeledSubstitutions, privateNucMutations.totalUnlabeledSubstitutions, privateNucMutations.totalPrivateSubstitutions, missing, unknownAaRanges, nonACGTNs, qc.overallScore, qc.overallStatus, qc.missingData.missingDataThreshold, qc.missingData.score, qc.missingData.status, qc.missingData.totalMissing, qc.mixedSites.mixedSitesThreshold, qc.mixedSites.score, qc.mixedSites.status, qc.mixedSites.totalMixedSites, qc.privateMutations.cutoff, qc.privateMutations.excess, qc.privateMutations.score, qc.privateMutations.status, qc.privateMutations.total, qc.snpClusters.clusteredSNPs, qc.snpClusters.score, qc.snpClusters.status, qc.snpClusters.totalSNPs, qc.frameShifts.frameShifts, qc.frameShifts.totalFrameShifts, qc.frameShifts.frameShiftsIgnored, qc.frameShifts.totalFrameShiftsIgnored, qc.frameShifts.score, qc.frameShifts.status, qc.stopCodons.stopCodons, qc.stopCodons.totalStopCodons, qc.stopCodons.score, qc.stopCodons.status, totalPcrPrimerChanges, pcrPrimerChanges, failedCdses, warnings, errors

We should improve the docs.

Note that the values are expected in a comma-separated list (no spaces):

--output-columns-selection=general,qc
ammaraziz commented 3 weeks ago

You're always one step ahead of my requests! Awesome.

Yes I think updating the docs would be very helpful. This is my suggestion:

  -C, --output-columns-selection <OUTPUT_COLUMNS_SELECTION>...
...
          Should contain a comma-separated list of individual column names and/or column category names to include into both CSV and TSV outputs. Possible categories:
          all, general, ref-muts, priv-muts, clade-founder-muts, rel-muts, errs-warns, qc, primers, dynamic
...

Thanks!