smith-chem-wisc / FlashLFQ

Ultra-fast label-free quantification algorithm for mass-spectrometry proteomics
GNU Lesser General Public License v3.0
19 stars 15 forks source link

Advice on parameter setting #105

Closed wsnoble closed 2 years ago

wsnoble commented 2 years ago

Not sure if this qualifies as a software issue per se, but I am hoping to get some advice about how to appropriately set parameters for FlashLFQ. I input a set of 9800 PSMs into the software, all of which were accepted at 1% FDR by Percolator. FlashLFQ only returned 5567 quants. I thought this seemed low, and I noticed that the default ppm tolerance is 10. Our param-medic software estimates a precursor window size of 37 ppm for this data, so just to be safe I tried running flashlfq with --ppm 50 --iso 20. I was surprised that the same run then yielded slightly fewer quants (5543). I assumed that if I loosened the threshold I would get more quants. Am I misunderstanding? Is this rate of successful quantification to be expected? If not, can you advise me on what I might should do differently?

I can send the files if that's helpful, though the mzml is too big to attach here.

Thanks. Bill

trishorts commented 2 years ago

Just want to double check that there is no sequence redundancy in your list of PSMs. Multiple PSMs from a single peak will only yield a single value.

The other option is that the current version might skip some unreadable psms b/c of sequence readability issues. I will have to check on that. If you make the mzml and the fasta available, i will search and quantify on our end to see what is going on.

wsnoble commented 2 years ago

There is no sequence redundancy. I thought at first that your guess about sequence readability might indeed be the issue, since my input file is a converted Percolator file. But I double checked, and the input file contains no modifications.

Here are the files: https://drive.google.com/drive/folders/1FF_vrsVwScEb5UJnBsEYHXSBoSPos31D?usp=sharing

trishorts commented 2 years ago

thanks. i'll download and look at them shortly.

trishorts commented 2 years ago

Bill,

I applied our whole workflow (calibration, ptm discovery and search with quant). Our calibration recommended search tolerances of 6ppm parent and 11ppm daughter. With those settings (and ptm discovery) we observed the following results: All target PSMS within 1% FDR: 18583 All target peptides within 1% FDR: 12810 All target protein groups within 1% FDR: 2492 W.R.T. quantification, there were 128810 unique peptides. FlashLFQ within MetaMorpheus provided intensities for 12091 with the remaining 782 being 0 intensity. That is a ~94% yield.

I will need to run your results separately through FlashLFQ (without doing our search) to see why you got the results that you got. There is possibly a filtering w.r.t. q-value that we can resolve easily.

If you like, I'd be happy to provide you with all my search results. Maybe they are useful to you in some way. Stay tuned for my investigation of your data.

trishorts commented 2 years ago

To start, I see 9796 PSMs with percolator q-value below 0.01. That agrees with your 9800 number. However, there are many duplicates in column K (sequence). In fact, I count only 7543 unique sequences. Please see the attached file. percolatorSequenceWithCounts.txt

wsnoble commented 2 years ago

Thanks! Yes, there are duplicates in column K, but I think they should each be associated with distinct scan numbers.

One obvious thing to check is whether the RTs are sensible. I wrote pyteomics code to try to extract those from the mzML and stick them into the FlashLFQ input file, but I certainly could have messed that step up somehow.

In case it's not obvious, flash.txt is the file I provided as input to FlashLFQ.

trishorts commented 2 years ago

that was not clear but that does help. And, i'm gonna try to see if my code to pull retention times is working. stay tuned

trishorts commented 2 years ago

flash.txt only has 6026 values. is that correct? image

wsnoble commented 2 years ago

OMG, that's embarrassing! That means the problem is entirely on my end. I will figure out what went wrong in the conversion. Sorry!

trishorts commented 2 years ago

LOL. That is the story of my whole life...