FDR calculation does not work when I lower the mass range

xomicsdatascience / zoDIAq

Cosine Similarity Optimization for DIA qualitative and quantitative analysis

MIT License

3 stars 4 forks source link

FDR calculation does not work when I lower the mass range #42

Open LiaSerrano opened 2 years ago

LiaSerrano commented 2 years ago

Hello,

I noticed a pattern in when the FDR calculation works and does not. When I drop the lower ms2 m/z range to 150 from 300, I get the error shown below and the resulting FDR outputs are blank. This was replicated with three different pairs of rawfiles where the only difference was the lower mass. Is there an obvious reason for this?

Thanks! Lia

data = group_nodes_with_same_edge(data) File "C:\Users\lrserrano\Anaconda3\envs\csod\lib\site-packages\csodiaq\idpicker.py", line 23, in group_nodes_with_same_edge if first: l1, l2 = map(list,zip(*data)) ValueError: not enough values to unpack (expected 2, got 0)

jessegmeyerlab commented 2 years ago

Thanks Lia,

It looks like this error keeps coming up in different scenarios. Based on the traceback it appears to be a problem in protein inference. Without investigating I suspect this could be coming up in cases where there are no significant proteins to group. For example, when you drop the mz range maybe you get more decoy hits and now they are in the top 100 proteins so there are none significant below 1% fdr. Does that seem possible? How many protein hits did you have before expanding the fragment range?

@CCranney would you have time to help us investigate this error? I think there is also a second issue open with this same error

LiaSerrano commented 2 years ago

Hi @jgmeyerucsd

I just checked on that-- yes, the results with the lower mz range have ~1K more decoys

LiaSerrano commented 2 years ago

how can this be explained when there is only 1 decoy hit in the unfiltered output?

jessegmeyerlab commented 2 years ago

Is that one decoy within the top 100 proteins if you sort proteins by the best scoring peptide?

On Mon, Oct 17, 2022, 9:39 AM LiaSerrano @.***> wrote:

how can this be explained when there is only 1 decoy hit in the unfiltered output?

— Reply to this email directly, view it on GitHub https://github.com/xomicsdatascience/CsoDIAq/issues/42#issuecomment-1281152831, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRLBLEAL5NUHI4SWYEZJSLWDV6NDANCNFSM6AAAAAAQVLFMPI . You are receiving this because you commented.Message ID: @.***>

LiaSerrano commented 2 years ago

no it is not

LiaSerrano commented 2 years ago

Is there a way to see most likely protein ID from idPicker without the protein FDR filter applied, but rather just from the 1% peptide FDR list?

jessegmeyerlab commented 2 years ago

Since I don't think we require a FASTA input, I believe the way this works is that it looks back at your spectral library to get the protein assignment. Maybe the format of your protein names in your spectral library file is different than the names used in our example human.tsv traml and that is confusing the protein grouping code? Worst case you could do this with a script in R or python manually by loading the spectral library and doing a lookup from the peptide hits.

@CCranney wrote the code for this and has since left the lab to start his MS degree. We are having troubling understanding his implementation because there are not many comments. If he does not have time to look at this unfortunately it will likely be a few months before we have hired more people to help