lskatz / lyve-SET

:dancer: :palm_tree: LYVE-SET, a method of using hqSNPs to create a phylogeny, especially for outbreak investigations
MIT License
25 stars 18 forks source link

Standardized VCF output #11

Closed lskatz closed 8 years ago

lskatz commented 9 years ago

Need to decide which fields are going to be necessary in each VCF file, so that any VCF-producing script in the future can conform to standards.

lskatz commented 9 years ago

I've come up with a list and also Richa from NCBI has been very kind to help us organize those fields between us all. However we still need to make that final 'ok'

lskatz commented 9 years ago

TODO still:

  1. Use homozygous-formatted SNP calls and remove heterozygous formatting
  2. Remove reference base from each ALT. It's unnecessary to have there.
  3. Allow for variants-only output
lskatz commented 8 years ago

These fixes are happening in a totally new script set_fixVcf.pl. It will be usable for other future SNP callers and can be used to retroactively fix other.

  1. Homozygous formatting: https://github.com/lskatz/lyve-SET/blob/0da27627c58e40b4d6b40abf878d56af42abe5a4/scripts/set_fixVcf.pl#L112
  2. Remove reference base from each ALT: just run varscan vanilla, and if there is a dot for a non-SNP in the ALT column, then merging seems to go well.
  3. mergeVcf.sh has a variants-only option

Any future standardized tags should go into the fixVcf script.