lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
476 stars 131 forks source link

request for Biostar398854 - Not called SNPs in lowercase letter in output FASTA #246

Open Zwiep2023 opened 3 months ago

Zwiep2023 commented 3 months ago

Subject of the issue

Not called SNPs in lowercase letter in output FASTA

Your environment

Steps to reproduce

I wondered whether it would be possible to include an option that indicates in the output FASTA that for a certain SNP that has not been called for a given sample in the vcf (e.g. based on VcfToTable output) a lowercase letter is used, and capital letters are only used for SNPs that have been called/have read coverage. Please find below an example that illustrates my request.

Simplified VcfToTable output for a certain SNP, for 13 samples:

REF A
ALT T
Sample Type AD Sample_1 NO_CALL 0,0 Sample_2 NO_CALL 0,0 Sample_3 NO_CALL 0,0 Sample_4 HOM_REF 1,0 Sample_5 HOM_REF 2,0 Sample_6 NO_CALL 0,0 Sample_7 HOM_REF 2,0 Sample_8 HOM_REF 3,0 Sample_9 HOM_REF 3,0 Sample_10 HOM_VAR 0,2 Sample_11 HOM_REF 2,0 Sample_12 NO_CALL 0,0 Sample_13NO_CALL 0,0

nucleotide representation in Biostar398854 output FASTA for the same SNP for these 13 samples:

Sample current output of Biostar398854 requested output of Biostar398854 Sample_1 a a Sample_2 a a Sample_3 a a Sample_4 a A Sample_5 a A Sample_6 a a Sample_7 a A Sample_8 a A Sample_9 a A Sample_10 T T Sample_11 a A Sample_12 a a Sample_13 a a