smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 45 forks source link

Allow VCF file as input to MM #489

Closed trishorts closed 6 years ago

trishorts commented 7 years ago

directly add variants prior to search

acesnik commented 7 years ago

I have a unit-tested C# VCF reader in my proteogenomics work.

acesnik commented 7 years ago

maybe we could plot that in mzlib?

VCF contains variants called against a genome; UniProt doesn't have a reference genome and annotates SAVs as amino acid base positions. The information in the VCF file wouldn't be useful unless you have additional information about the reference genome and how the proteins were derived. I'm working on that in Proteoform Database Engine world.

acesnik commented 7 years ago

It looks like Uniprot also maintains references to dbSnp, so we could cross-reference sequence variants pulled in from VCF to decide which UniProt variants to keep. I think that's the only way we could get around not having a reference genome, at the moment.

acesnik commented 7 years ago

trishorts:

could we use a paired system where someone would have to use an ensembl protein database with an ensembl context vcf? getting away from uniprot and using gptmd for ptms?

rmillikin commented 6 years ago

see #1124