levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Add USI PROXI backend for ProteomeExchange #36

Closed mobiusklein closed 3 years ago

mobiusklein commented 3 years ago

This PR adds a new PROXI backend to fulfill USIs, http://proteomecentral.proteomexchange.org/. Proteome Exchange runs an aggregation service on the backend which pulls from all the other servers. Unlike the other servers though, it doesn't implement any of the other PROXI services, not that I've implemented anything to talk to them yet.

While I was at it, I fixed the parsing code to cast numerical attributes to Python numbers rather than leaving them as strings.

The USI paper is available in preprint here: https://www.biorxiv.org/content/10.1101/2020.12.07.415539v1, since I don't know that I ever explained what this was, really.

levitsky commented 3 years ago

Thank you, but I'm not sure I understand how the new backend should work. At http://proteomecentral.proteomexchange.org/usi/, I see how I can get results from all repositories with a singe query, but running usi.proxi() with the USI from our current test ("mzspec:MSV000085202:210320_SARS_CoV_2_T:scan:131256") raises a 404 exception with 'proteome_exchange' as backend. Another USI works fine though ("mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2"). Is this expected?

mobiusklein commented 3 years ago

I may have misunderstood the policy then, either it doesn't support MSV-only submissions or it will only serve as an aggregator on the backend "later".

Well, to deal with that I've added a real aggregator implementation. PROXI isn't sufficiently fleshed out enough to have a method for coalescing responses, if one could even be devised.

mobiusklein commented 3 years ago

We can set this back to a draft if you want to see if anything new develops first.

levitsky commented 3 years ago

Well, to deal with that I've added a real aggregator implementation.

That's cool. Is it intentional that the new class doesn't play well with proxi() and is not a subclass of _PROXIBackend? You can't pass it as a class or as an instance. I realize I can just call the instance, but I can do the same with other backends. Also, when passing an instance, I get this:

In [2]: ag = usi.PROXIAggregator()

In [3]: usi.proxi("mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2", ag)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-b8d9d4e18687> in <module>
----> 1 usi.proxi("mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2", ag)

~/py/pyteomics/pyteomics/usi.py in proxi(usi, backend, **kwargs)
    361     if isinstance(backend, str):
    362         backend = _proxies[backend](**kwargs)
--> 363     elif issubclass(backend, _PROXIBackend):
    364         backend = backend(**kwargs)
    365     elif callable(backend):
mobiusklein commented 3 years ago

I hadn't originally intended for the aggregator to be part of the high level API because it returns lists of responses instead of a single object, but I wrote a simple merging method and then integrated it into the high level API.

levitsky commented 3 years ago

Thank you, looks great to me! Do you think it's good to merge?

mobiusklein commented 3 years ago

Thank you for adding the unit tests. Yes, nothing more I wanted to add for this feature.