smith-chem-wisc / ProteoformSuite

Construction, quantification, and visualization of proteoform families
https://smith-chem-wisc.github.io/ProteoformSuite/
GNU General Public License v3.0
12 stars 19 forks source link

Look for contaminant proteoforms with natural abundance masses (not NeuCode) #297

Open acesnik opened 7 years ago

acesnik commented 7 years ago

I'm curious whether some components belonging to (unlabeled) contaminants might somehow impact NeuCode quantification or identification. We would need to go look for examples where this is the case before implementing changes.

Secondarily, because contaminants are unlabeled, our theoretical database is incorrect, since we're constructing the contaminant theoreticals as if they were NeuCode labeled. I suggest:

  1. We ignore contaminants for NeuCode labeled experiments, or
  2. We search for them at natural isotopic abundances

One thing to consider is that decoy "NeuCode" contaminants are getting hits when we use a decoy database.

leahvschaffer commented 7 years ago

I think it would be a good idea to always look for neucode peaks that could correspond to (unlabeled) contaminants.

acesnik commented 7 years ago

One thing we should check is whether there are examples of contaminants that end up in a NeuCode pair. Shortreed stipulates that they're filtered out with the requirement that each pair have a NeuCode ratio of between 1.5 to 6.

leahvschaffer commented 7 years ago

Yes that's what I meant by NeuCode peaks. Would be interesting to see if that's happening or if the ratio requirement is good enough

acesnik commented 7 years ago

We should avoid populating our database with proteins we can't identify to avoid blowing up our FDRs with hits to their decoys. Unlabeled contaminants might be in that category.

acesnik commented 7 years ago

@lschaffer2 I like your idea. Maybe instead of trying to filter them, we should initially just mark them as possibly having some influence from contaminants, so that we know where to check.