saezlab / pypath

Python module for prior knowledge integration. Builds databases of signaling pathways, enzyme-substrate interactions, complexes, annotations and intercellular communication roles.
http://omnipathdb.org/
GNU General Public License v3.0
134 stars 47 forks source link

The `complexes` query reports stripped references incorrectly #214

Open pokedthefrog opened 1 year ago

pokedthefrog commented 1 year ago

When calling omnipath.requests.Complexes.complex_genes(), the references column always reports stripped references which, in turn, results in the references_stripped column containing rubbish data. This seems to be a problem with the web service (see here) and also exists in pypath.

I am not sure if always reporting back stripped references for complexes is an intentional choice. Regardless, it is the following snippet that seems to break the references_stripped column:

https://github.com/saezlab/omnipath/blob/f48c3bb37c91ebf3cfd8261236c7770bcdab6cf2/omnipath/_core/requests/_utils.py#L151-L157

Changing the regex to something that ensures that the label does indeed exist seems to provide a temporary fix: (?=.*[a-zA-Z])[-\w]*:?(\d+). I understand, though, that this is a minor issue and making this change may unintentionally break something else.

Thanks a lot for the amazing package and your help! :blush:

deeenes commented 1 year ago

Hi,

Thanks for reporting! The solution would be to make pypath include the resource names along the references, as in other query types. It will take some days -- or couple of weeks -- until this propagates to the web service though. I hope that's fine with you.

I might transfer this issue to pypath.

pokedthefrog commented 1 year ago

Of course it is— thanks a lot for the prompt response!