Closed jymbcrc closed 1 year ago
@CCranney just wanted to check if you saw this
I will investigate this weekend and get back to you by EOD Saturday.
Thanks Caleb!!! I started trying to investigate, and it seems like a problem with at the spectra 33 and 37. will update if I figure out more
I think I figured it out!!!
Within class IdentificationSpectraMatcher, when self.ppmMatches has length zero (no matches) it throws this error.
If we add a check for len(self.ppmMatches) before running line 149 with the np.delete then it should be okay except I'm not sure if skipping that line sometimes will break something else.
I added this above and it allows it to finish the spectra comparisons:
if len(self.ppmMatches>0):
self.libraryTags, self.queryTags, self.libraryIntensities, self.queryIntensities, self.ppmMatches = [np.delete(x, remove_indices) for x in [self.libraryTags, self.queryTags, self.libraryIntensities, self.queryIntensities, self.ppmMatches]]
if len(self.decoys) > 0: self.decoys = np.delete(self.decoys, remove_indices)
Then it throws a different error when it tries to write: I think it doesn't know how to handle writing a report if there are zero matches.
This data is quite sparse compared to previous data we used in testing so that makes sense why we never saw this error before.
Begin Writing to File:
0:01:40.518557
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14508/3703801407.py in <module>
----> 1 cif.perform_spectra_pooling_and_analysis( args['files'][i],
2 outFile,
3 lib,
4 args['fragmentMassTolerance'],
5 maxQuerySpectraToPool,
~\Documents\GitHub\CsoDIAq-master\csodiaq_identification_functions.py in perform_spectra_pooling_and_analysis(querySpectraFile, outFile, lib, tolerance, maxQuerySpectraToPool, corrected, histFile)
94
95 smf.print_milestone('\nBegin Writing to File: ')
---> 96 allSpectraMatches.write_output(outFile, querySpectraFile, maccCutoff, queScanValuesDict, libIdToKeyDict, lib)
97
98
~\Documents\GitHub\CsoDIAq-master\IdentificationSpectraMatcher.py in write_output(self, outFile, expSpectraFile, scoreCutoff, queValDict, idToKeyDict, lib)
220 expSpectraFile, #fileName
221 scan, #scan
--> 222 queValDict[scan]['precursorMz'], #MzEXP
223 libKey[1], #peptide
224 lib[libKey]['ProteinName'], #protein
KeyError: 'precursorMz'
Hi all,
I've been stuck re-installing conda commands where I'm getting some new errors (with PyQt5 and pip rather than CsoDIAq). I'll be making another issue to address that one - to help fix that, can one or both of you share with me the version of pip and PyQt5 you are using?
In trying to debug what I can without actually being able to run code (thank you for your investigation Jesse!!), there are a few oddities I'd like to explore:
self.ppmMatches
is empty, I would expect self.libraryTags, self.queryTags etc. to also be empty. The use of all those variables was meant to function as a kind of 2D array, and was used as a stop-gap measure until the numba package allows for actual 2D arrays. As such, they should all be the same length. If self.ppmMatches is empty while the others are full, the problem likely lies in the find_matching_peaks
function in the spectra_matcher_functions.py file. Can you verify in your if statement in your code that something like self.libraryTags
also has a length of 0?self.ppmMatches
would be the parameter to throw the error), then we would need to effectively skip the window that is throwing the error and let the program chug along past it. I think your code effectively does that - we may want to investigate further, but it should work for now.curLibTag = self.libraryTags[0]
to throw an error because this list would be empty. Notice that problem #2 above is from a problem that is done in batches, one per window. It's likely that one batch had no matches, but others did, so in this instance it is writing to a file. If I'm reading this right, it looks like there is a scan number found in the final results that was not originally found in the file - in other words, it's either a bizarre artifact from the earlier error, or it's a completely new one.I've written some temporary fixes to a new branch emptyWindowSkip
. I added three lines that effectively:
These are untested, as I need to resolve the other issue before I can run CsoDIAq, but at the very least in pinpoints places in the code where fixes will likely need to be made. Especially in reference to the last change, these are temporary patches that will require more investigation once I figure out this PyQt5/pip dilemma.
Some notes for investigating problem 3: The queValDict
is initialized, I believe, on line 285 of csodiaq_identification_functions.py (queScanValuesDict
). It is a defaultdict, which means that if you try to access a key that does not exist, it will populate a new key:value pair rather than throw an error. When the mystery scan is accessed in IdentificationSpectraMatcher.write_output()
function, it creates a new, empty dictionary as the value that has none of the expected metadata (precursor mz, etc.) that queValDict was made to reference. I'd want to determine what that mysterious scan is - probably by adding print(scan)
right before my third code addition and seeing the output, trying to determine 1) where it came from and 2) why it wasn't a key in queValDict.
Also, sorry for closing the issue temporarily - not sure which key I hit that did that, but it was a complete accident.
Thanks for looking at this Caleb.
To clarify, self.ppmMatches didn't throw an error, the error comes from np.delete index -1 on this line:
self.libraryTags, self.queryTags, self.libraryIntensities, self.queryIntensities, self.ppmMatches = [np.delete(x, remove_indices) for x in [self.libraryTags, self.queryTags, self.libraryIntensities, self.queryIntensities, self.ppmMatches]]
but I was suspicious if the error came from a lack of matches since I was printing the query spectra and # 33 looked spare. Then I printed the length of self.ppmMatches and saw that it was empty for spectra 33 and 37 in this specific file, so likely all relevant match parameters in self are empty. I think the fix needs to account for this rare possibility that zero fragments match. We should exclude them from FDR calculation(s) as I think you suggest in your fix
Let me know if I can help
I can confirm that self.queryIntensities is also empty for scan 33
Perfect, thank you Jesse! That eliminates the likelihood of a bizarre edge case - we're purely looking at what happens when a scan has no hits.
Ok, I think I made the appropriate fixes in the emptyWindowSkip
branch. Below are the issues addressed by the fixes:
self.queryTags
to floats instead of ints. So it would search for metadata associated with scan 9.0 and not find anything, because all metadata for that scan was saved under 9. That was the "mystery scan" I referenced earlier.I ran the data files given above and reached the end without a problem. Can you switch branches and try it on your end? You may need to uninstall csodiaq (pip uninstall csodiaq
) and re-install it. If it works on your end, I'll merge it with the main branch.
Awesome thanks! I will try this in 30 minutes
On Mon, Feb 7, 2022, 11:21 AM Caleb Cranney @.***> wrote:
Perfect, thank you Jesse! That eliminates the likelihood of a bizarre edge case - we're purely looking at what happens when a scan has no hits.
Ok, I think I made the appropriate fixes in the emptyWindowSkip branch. Below are the issues addressed by the fixes:
- As you saw and added code for, there were several processes that should be skipped if no hits were found is a query scan.
- The later bug was a datatype issue. Adding empty lists to the mix coerced the datatype of self.queryTags to floats instead of ints. So it would search for metadata associated with scan 9.0 and not find anything, because all metadata for that scan was saved under 9. That was the "mystery scan" I referenced earlier.
- This never came up, but if none of the scans had any hits, the write_output would need to account for that and print a blank document.
I ran the data files given above and reached the end without a problem. Can you switch branches and try it on your end? You may need to uninstall csodiaq (pip uninstall csodiaq) and re-install it. If it works on your end, I'll merge it with the main branch.
— Reply to this email directly, view it on GitHub https://github.com/CCranney/CsoDIAq/issues/25#issuecomment-1031719611, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRLBLA2AZBUN3KSARN5NE3UZ75SDANCNFSM5NJUJV3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
@CCranney This appears to work, thank you!
@jymbcrc can you please try this update? With your 45k data I get exactly 1,000 protein groups
Thanks Caleb for putting together the wiki - I was wondering about the two protein columns and then found your descriptions here: https://github.com/CCranney/CsoDIAq/wiki/File-Content-Descriptions#csodiaq-file_input-query-file-name-tag_proteinfdrcsv
Before you merge this can you please rename 'leadingProtein' to 'proteinGroup' and 'protein' to 'allProteinMatches'?
I misunderstood the column definitions and wrote 1000 proteins above but the groups column is 'leadingProtein' so actually there are only 531 unique protein groups CsoDIAq-file1_HELA45K_corrected_proteinFDR.csv
The 2.0 rewrite no longer has this problem. I ran the data through and it finished without errors. Feel free to reopen it if I missed something.
Hello, Cranney, Sorry to disturb you again, recently I meet a new issue when analyzing the DIA data. it reports a IndexError: index -1 is out of bounds for axis 0 with size 0. Below is the whole console content and the raw data and mzMxl data I used. Thank you very much. I am sorry to bother you again csoDIAq console.txt data2.zip