Open LiaSerrano opened 2 years ago
@AlexandreHutton was going to add direct support for prosit libraries. Let's see if that works because that would avoid any problems with library conversion. Lex can you please update us where we are with that?
I found this exact error while working with the FragPipe library. I thought it might be a problem with the library itself, but it sounds like it's an issue with the code. I think the problem might be with the FDR calculation somewhere. Adding in decoys gets us past that error but then produces an empty proteinFDR file. I'm investigating.
Thanks Lex for the update. I have some ideas.
If there was a way to convert a library that we know works to the other formats then we could rule out or confirm the issue relates to edge cases with the library format.
Since you think the problem is with fdr calculations and adding decoys gets past the error to produce empty output, I wonder how the fdr calculation deals with the case where there are no decoy hits. This could happen if the library contains no decoys or by luck in some rare circumstances.
@LiaSerrano, does your library have decoys?
@AlexandreHutton does the frag pipe library have decoys?
The library I was using has reverse sequence decoys predicted by prosit. This error actually doesnt happen to me when I take out the decoy entries. Let me know if you would like me to send an example! Thank you
@AlexandreHutton does the frag pipe library have decoys?
It does not. I converted some entries from another (functional) library and added them in, which resulted in the empty output mentioned previously.
I wonder if the decoy is the same as the label CsoDIAq looks for.
It might help us understand if you can share the exact library you're using. You could email it to Lex and I if you want to keep the library private.
The library I was using has reverse sequence decoys predicted by prosit. This error actually doesnt happen to me when I take out the decoy entries. Let me know if you would like me to send an example! Thank you
Please do!
I'll shoot over an email, thanks!
@AlexandreHutton does the frag pipe library have decoys?
It does not. I converted some entries from another (functional) library and added them in, which resulted in the empty output mentioned previously.
Thanks Lex,
It might be how it handles where the decoys are hit. If it hits a decoy within the first 100 proteins (sorted by MaCC) then I believe it should return an empty list.
I don't remember how @CCranney made it handle when it never hits a decoy but that could be another place to look. It might help you debug if you can look at the intermediate matches list (that would be in memory) for proteins and see where the decoys fall in the order.
Thanks for looking at this Lex
Hi all,
I dug into the code, looking specifically for the error @LiaSerrano included in her PDF in the first comment. Backtracing the error, I think no peptides were identified (the _peptideFDR.csv
output is completely blank). That, or the library used lacks or has different peptide and/or protein labels, and as such the "peptide" and/or "protein" columns of the peptideFDR output file are blank. This is just me extrapolating what the error could be, but could I have access to the data/GUI settings that led to this error?
Breakdown of my thought process: The error is found here:
File "C:\Users\lrserrano\Anaconda3\envs\csod\lib\site-packages\csodiaq\idpicker.py",
line 23, in group_nodes_with_same_edge
if first: l1, l2 = map(list,zip(*data))
It looks like it tried to break data
into two lists when data
was actually blank. This data
variable should have been a list of length-2 tuples, pairing peptides to proteins. So going back to where data came from, it looks like it was created and passed down through the following functions:
csodiaq_identification_functions.py
, . The <peptideDf>
variable, a dataframe that was used to create the _peptideFDR.csv
file in the output.format_peptide_protein_connections(peptideDf)
on line 104.EHALLAYTLGVK
was attached to the protein group 3/sp|Q5VTE0|EF1A3_HUMAN/sp|Q05639|EF1A2_HUMAN/sp|P68104|EF1A1_HUMAN
, you would expect the following list of tuples to be created. All peptide-protein connections would be put into the same list.
[
('EHALLAYTLGVK', 'sp|Q5VTE0|EF1A3_HUMAN'),
('EHALLAYTLGVK', 'sp|Q05639|EF1A2_HUMAN'),
('EHALLAYTLGVK', 'sp|P68104|EF1A1_HUMAN')
]
_peptideFDR.csv
is completely empty, or the peptide
and/or protein
columns of the _peptideFDR.csv
file are blank. I'm leaning towards the former, but won't know without looking at the data in question.
I tried to format a Prosit library like the TraMl lib. I am getting a similar error to what I did with an MGF massiveKB library—I think its not able to associate peptide to protein ?
I get a “file_corrected” output but the peptide/spectral FDR outputs are empty and there is no proteinFDR output. These outputs appear when I take out the decoys, however.
Let me know if you would like me to send over the library I was using if that would be helpful. csodiaq_error_August.pdf