Closed trishorts closed 6 years ago
I have a unit-tested C# VCF reader in my proteogenomics work.
maybe we could plot that in mzlib?
VCF contains variants called against a genome; UniProt doesn't have a reference genome and annotates SAVs as amino acid base positions. The information in the VCF file wouldn't be useful unless you have additional information about the reference genome and how the proteins were derived. I'm working on that in Proteoform Database Engine world.
It looks like Uniprot also maintains references to dbSnp, so we could cross-reference sequence variants pulled in from VCF to decide which UniProt variants to keep. I think that's the only way we could get around not having a reference genome, at the moment.
trishorts:
could we use a paired system where someone would have to use an ensembl protein database with an ensembl context vcf? getting away from uniprot and using gptmd for ptms?
see #1124
directly add variants prior to search