nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
115 stars 107 forks source link

Incorrect depth from ivar variants reported in variants long table #368

Closed wutron closed 1 year ago

wutron commented 1 year ago

Description of the bug

When read depth is very high, ivar variants outputs depth in scientific notation. variants_long_table reports only the part of the number before the decimal, resulting in inaccurate depths and allele frequencies.

I believe the issue can be resolved by modifying https://github.com/nf-core/viralrecon/blob/master/bin/ivar_variants_to_vcf.py#L123 to INFO = f"DP={int(float(line[11]))}" to always report depth in vcf as an integer.

$ grep 'e+' out/variants/ivar/ww.tsv REGION POS REF ALT REF_DP REF_RV REF_QUAL ALT_DP ALT_RV ALT_QUAL ALT_FREQ TOTAL_DP PVAL PASS GFF_FEATURE REF_CODON REF_AA ALT_CODON ALT_AA
MN908947.3 21730 A G 1317930 667938 35 14947 7521 36 0.0112138 1.33291e+06 0 TRUE cds-QHD43416.1 TTA L TTG L
MN908947.3 22583 G +T 1474471 730754 36 79562 0 20 0.0531601 1.49665e+06 0 TRUE NA NA NA NA NA
MN908947.3 22977 T C 1910409 57949 36 87378 1674 36 0.0437341 1.99794e+06 0 TRUE cds-QHD43416.1 ATC I ACC T
MN908947.3 23052 T A 1983145 60241 36 12052 145 36 0.0060397 1.99546e+06 0 TRUE cds-QHD43416.1 TTC F TAC Y
MN908947.3 23126 G A 3716960 1840637 37 5148 2450 36 0.00138304 3.72223e+06 1.39735e-52 TRUE cds-QHD43416.1 GCA A ACA T
MN908947.3 23256 T C 2436217 1880072 36 4472 348 36 0.00183214 2.44086e+06 8.3265e-135 TRUE cds-QHD43416.1 TTT F TCT S
MN908947.3 24223 C +T 1600383 258881 36 9068 0 20 0.00566151 1.60169e+06 1 FALSE NA NA NA NA NA
MN908947.3 24419 A G 2602143 1303101 37 3394 1760 35 0.00130258 2.6056e+06 9.93939e-25 TRUE cds-QHD43416.1 AAC N GAC D
MN908947.3 24442 C A 1274046 1271015 36 36373 36310 36 0.0277556 1.31047e+06 0 TRUE cds-QHD43416.1 AAC N AAA K
MN908947.3 24516 A G 3613820 1381048 37 3692 218 36 0.00102047 3.61794e+06 0.192741 FALSE cds-QHD43416.1 GAC D GGC G
$ awk -F, '$10>1' out/variants/ivar/variants_long_table.csv SAMPLE CHROM POS REF ALT FILTER DP REF_DP ALT_DP AF GENE EFFECT HGVS_C HGVS_P HGVS_P_1LETTER CALLER
ww MN908947.3 21730 A G PASS 1 1317930 14947 14947.0 S synonymous_variant c.168A>G p.Leu56Leu p.L56L ivar
ww MN908947.3 22583 G GT PASS 1 1474471 79562 79562.0 S frameshift_variant&stop_gained c.1026dupT p.Asn343fs p.N343fs ivar
ww MN908947.3 22977 T C PASS 1 1910409 87378 87378.0 S missense_variant c.1415T>C p.Ile472Thr p.I472T ivar
ww MN908947.3 23052 T A PASS 1 1983145 12052 12052.0 S missense_variant c.1490T>A p.Phe497Tyr p.F497Y ivar
ww MN908947.3 23126 G A PASS 3 3716960 5148 1716.0 S missense_variant c.1564G>A p.Ala522Thr p.A522T ivar
ww MN908947.3 23256 T C PASS 2 2436217 4472 2236.0 S missense_variant c.1694T>C p.Phe565Ser p.F565S ivar
ww MN908947.3 24223 C CT ft 1 1600383 9068 9068.0 S frameshift_variant c.2664dupT p.Gly889fs p.G889fs ivar
ww MN908947.3 24419 A G PASS 2 2602143 3394 1697.0 S missense_variant c.2857A>G p.Asn953Asp p.N953D ivar
ww MN908947.3 24442 C A PASS 1 1274046 36373 36373.0 S missense_variant c.2880C>A p.Asn960Lys p.N960K ivar
ww MN908947.3 24516 A G ft 3 3613820 3692 1230.67 S missense_variant c.2954A>G p.Asp985Gly p.D985G ivar

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version: 19.01.0.5050 Hardware: Cloud VM Executor: local Container engine: Docker OS: Linux Version of nf-core/viralrecon: 2.5

drpatelh commented 1 year ago

Thanks for reporting @wutron ! Fancy creating a PR with a fix for this issue and #369 ? Your timing is great because I am planning to release tomorrow.

drpatelh commented 1 year ago

Fixed in https://github.com/nf-core/viralrecon/pull/370

Invited you to become a member of the nf-core Github org too. Thanks for providing the fix! Fancy reviewing and approving the PR? You will be able to test dev once that PR is merged to confirm everything is ok.

nextflow pull nf-core/viralrecon -r dev
nextflow run nf-core/viralrecon <YOUR_PARAMETERS> -r dev