smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
91 stars 46 forks source link

"too many" is less informative than previously outputting a single entry #561

Closed zrolfs closed 7 years ago

zrolfs commented 7 years ago

Have output be "XXXXXXXXX or too many"

stefanks commented 7 years ago

what's XXXXXXXXX ?

zrolfs commented 7 years ago

a sequence, a number, anything that uses resolve to potentially return "too many".

stefanks commented 7 years ago

Could you give a specific example?

zrolfs commented 7 years ago

LIAQKVRGVDVVVGGHSNTFLYT or (too many)

stefanks commented 7 years ago

The whole reason "too many" is returned is because of ambiguity! There is no single base sequence that matches

zrolfs commented 7 years ago

"too many" is returned because the ambiguity results in a length over 32000, not because there is "too much" ambiguity. We know nothing about the putative sequence if "too many" is returned.

stefanks commented 7 years ago

Sure. But I don't like outputting a specific match. I feel that it's misleading

zrolfs commented 7 years ago

I'm outputting 31990 chars and then (too many)

stefanks commented 7 years ago

not a fan of this. It still arbitrarily gives preference to some sequences that made it over some that didn't

zrolfs commented 7 years ago

If there are 31990 characters worth of sequences, nobody is going to look at every single one of them. I want as many as possible, though, to allow the viewer to find the pattern. If multiple sequences scored the same, they have to have something in common that hits those peaks.

stefanks commented 7 years ago

Let's make a new column instead! That contains the "common pattern"

zrolfs commented 7 years ago

If there are really over 31990 characters, it's probably a crummy spectra and nobody will care about it. I care because of fusion peptide searches yielding low scoring initial identifications. I need something for Neo to grab.

zrolfs commented 7 years ago

I don't think the feature would be worth the time put into it.

stefanks commented 7 years ago

It might be useful to people! I don't think it's not worth the time.

And I have two problems with displaying partial info: 1. Misleading 2. Makes files bigger

zrolfs commented 7 years ago

The "Matched Ion Masses" column should be enough info to obtain a common pattern. I need an example sequence! It is not misleading if we put "(too many)" at the end, as that is the disclaimer that not all possible sequences are being shown. I understand that including many entries will make files bigger, but we already do that with sequences that contain 31999 characters!

zrolfs commented 7 years ago

Going off of that thought, I notice that "Matched Ion Masses" is left blank in the case of ambiguity. Shouldn't those peaks be identical?

stefanks commented 7 years ago

I believe its currently blank if the compact peptide is ambiguous. But I see your point, even in case of ambiguity the peaks matched should be identical... Hm. The only way to get identical scores is to match identical peaks. Good point. You are welcome to change this!

zrolfs commented 7 years ago

Begrudgingly agree to disagree.