wfondrie / mokapot

Fast and flexible semi-supervised learning for peptide detection in Python
https://mokapot.readthedocs.io
Apache License 2.0
43 stars 15 forks source link

pepxml parser Error: float() argument must be a string or a number, not 'NoneType' #43

Closed stsour closed 2 years ago

stsour commented 2 years ago

Hi! Thanks for all your great work with mokapot!

I am trying to read in tide search results (target + decoy concatenated), and am running into an error when parsing the file.

psms = mokapot.read_pepxml('crux-output/tide-search.pep.xml', decoyprefix='decoy') Traceback (most recent call last): File "", line 1, in File "/home/tsour.s/.conda/envs/mokapot/lib/python3.9/site-packages/mokapot/parsers/pepxml.py", line 65, in read_pepxml psms = pd.concat([_parse_pepxml(f, decoy_prefix) for f in pepxml_files]) File "/home/tsour.s/.conda/envs/mokapot/lib/python3.9/site-packages/mokapot/parsers/pepxml.py", line 65, in psms = pd.concat([_parse_pepxml(f, decoy_prefix) for f in pepxml_files]) File "/home/tsour.s/.conda/envs/mokapot/lib/python3.9/site-packages/mokapot/parsers/pepxml.py", line 174, in _parse_pepxml df = pd.DataFrame.from_records(itertools.chain.from_iterable(psms)) File "/home/tsour.s/.conda/envs/mokapot/lib/python3.9/site-packages/pandas/core/frame.py", line 2034, in from_records first_row = next(data) File "/home/tsour.s/.conda/envs/mokapot/lib/python3.9/site-packages/mokapot/parsers/pepxml.py", line 236, in _parse_spectrum spec_info["ret_time"] = float(spectrum.get("retention_time_sec")) TypeError: float() argument must be a string or a number, not 'NoneType'

Here is a link to the tide search results: https://drive.google.com/file/d/1ApGUWC6KvdjRd-PLZuhZHxRj6DuzN1JA/view?usp=sharing

It seems like there is retention time info missing from the tide search, but I am not sure why this would be. Any help would be greatly appreciated!

Thanks!

wfondrie commented 2 years ago

Hi @stsour - thanks for opening this issue.

It seems like there is retention time info missing from the tide search, but I am not sure why this would be.

This is indeed the case. Unfortunately, Tide's pepXML output has not traditionally conformed to the pepXML standard, so it's hard to guarantee that any change we make here will work in the long-term.

In this case, retention time is one of the missing fields. The good news is that mokapot only uses retention time to create a unique identifier for the mass spectra acquired in the experiment and it is often redundant with other information. I'll try disabling this for when the retention time field is missing and see if it works.

In the mean time, my suggestion is to use the --pin-output T parameter with tide-search, which will generate a .pin file that can be used by mokapot.

stsour commented 2 years ago

Hi @wfondrie, thank you so much for the quick response! I have actually tried to use mokapot with comet search results as an alternative and this worked well. Good to know about .pin output for future searches