Closed hguturu closed 2 years ago
Hi!
Two points:
featureXML
, rather consensusXML
. Those appear to have a different schema. I'm not sure the featurexml
parser will parse them adequately. I can take this as a feature request to add a consensusXML parser, perhaps? (I'm not sure what the difference really is in terms of content as I haven't used these in practice.)So, what file are you having a problem with? Is it a consensusXML or a featureXML file, and what is the schema URL?
Pardon the sloppy query. I think this is more appropriate - https://github.com/OpenMS/OpenMS/search?q=extension%3AfeatureXML+spectrum_reference. This shows two formats for the spectrum_reference
even for featureXML
. The one I have is of the form spectrum_reference="controllerType=0 controllerNumber=1 scan=5057"
.
featureXML
, not consensusXML
.xs:unsignedInt
.Here is my full error originating at https://github.com/levitsky/pyteomics/blob/master/pyteomics/xml.py#L456:
Traceback (most recent call last):
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 456, in _get_info
info[k] = a(v)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 158, in convert_from
return cls.str_to_num(s, t)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 153, in str_to_num
return numtype(s) if s else None
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/auxiliary/structures.py", line 269, in __new__
inst = int.__new__(cls, value)
ValueError: invalid literal for int() with base 10: 'controllerType=0 controllerNumber=1 scan=12235'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hguturu/export_openms_apexrt.py", line 58, in <module>
main()
File "/Users/hguturu/export_openms_apexrt.py", line 54, in main
export_openms_apexrt(args.inputs, args.output)
File "/Users/hguturu/export_openms_apexrt.py", line 35, in export_openms_apexrt
for feature in pyteomics.openms.featurexml.read(open(input_fn, "rb")):
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py", line 176, in __next__
return next(self._reader)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 1261, in __next__
return next(self._iterator)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 586, in _iterfind_impl
info = self._get_info_smart(child, **kwargs)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/openms/featurexml.py", line 55, in _get_info_smart
info = self._get_info(element, **kw)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 428, in _get_info
self._get_info_smart(child, ename=cname, **kwargs))
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/openms/featurexml.py", line 55, in _get_info_smart
info = self._get_info(element, **kw)
File "/Users/hguturu/miniconda3/lib/python3.9/site-packages/pyteomics/xml.py", line 461, in _get_info
raise PyteomicsError(message)
pyteomics.auxiliary.structures.PyteomicsError: Pyteomics error, message: 'Error when converting types: ("invalid literal for int() with base 10: \'controllerType=0 controllerNumber=1 scan=12235\'",)'
Thanks for the clarification. I can reproduce the error with the example files from your query. Looks like a workaround is needed for the incorrect type in the schema.
Excellent. I also opened an issue upstream with OpenMS to see if they can update the schemas - https://github.com/OpenMS/OpenMS/issues/5478.
By the way, as a temporary workaround, you should be able to get the parser to work by commenting out the lines in _schema_defaults
and instantiating the parser with read_schema=False
.
You should be able to fix it at run time too, remove the offending keys from _featuerxml_schema_defaults
, and pass read_schema=False
when creating the FeatureXML
object/calling featurexml.read
.
from pyteomics.openms import featurexml
featurexml.FeatureXML._default_schema['ints'].remove(
('PeptideIdentification', 'spectrum_reference'))
featurexml.FeatureXML._default_schema['ints'].remove(
('UnassignedPeptideIdentification', 'spectrum_reference'))
featurexml.read(path, read_schema=False)
The run time fix is great since that way I don't have to edit the source. I did find that I also needed to add the following since the default schema had the type wrong.
pyteomics.openms.featurexml.FeatureXML._default_schema["ints"].remove(
("quality", "quality")
)
pyteomics.openms.featurexml.FeatureXML._default_schema["floats"].add(
("quality", "quality")
)
Looks like quality is a double in https://github.com/OpenMS/OpenMS/blob/d9692da0d410c06b6cdc960f608a5c962360d09c/share/OpenMS/SCHEMAS/FeatureXML_1_6.xsd. My xml reading isn't great, but the min/maxOccurs makes me think it might even have to be a floatlists
, but float
worked for my test case.
https://github.com/levitsky/pyteomics/blob/34c87ac7198b7cff45cb46a4001345e87c6bb5a4/pyteomics/_schema_defaults.py#L272-L274
Based on the test cases at https://github.com/OpenMS/OpenMS/search?q=spectrum_reference, it looks like all instances of
('PeptideIdentification', 'spectrum_reference')
and('UnassignedPeptideIdentification', 'spectrum_reference')
are strings not ints. Oddly when I comment these two lines out the issue is not resolved. Perhaps due to cache or its being caught elsewhere?