saezlab / pypath

Python module for prior knowledge integration. Builds databases of signaling pathways, enzyme-substrate interactions, complexes, annotations and intercellular communication roles.
http://omnipathdb.org/
GNU General Public License v3.0
130 stars 43 forks source link

CellPhoneDB v5 (update and is_ppi flag) #269

Open dbdimitrov opened 7 months ago

dbdimitrov commented 7 months ago

Hey Denes,

Recently, CellPhoneDB got bumped to v5, and the data is stored here: https://github.com/ventolab/cellphonedb-data/tree/master

Seems to have changed format from: https://github.com/saezlab/pypath/blob/bf81f34120b82157fa3ebc15d39b0489b97fbe5e/pypath/resources/urls.py#L1103

Let me know if I can help with this. Daniel

dbdimitrov commented 7 months ago

@deeenes also please use the is_ppi flag, I found a lot of erroneous interactions between enzymes and receptors. (no metabolite)

dbdimitrov commented 7 months ago

Maybe how I process it here would help:

https://github.com/saezlab/liana-py/issues/60

Nic-Nic commented 5 months ago

Hi Daniel,

As far as I could find, pypath i already using the CellPhoneDB git as a source for the data (see here and then here), so I think it is already using the v5 version of the data.

What I found out now when checking this, is that although the retrieval of interactions works fine:

> from pypath.inputs import cellphonedb
> list(cellphonedb.cellphonedb_interactions())[-1]
CellphonedbInteraction(id_a='P16070', id_b='O43914', sources='CellPhoneDB', references='', interaction_type='unknown-unknown', type_a='unknown', type_b='unknown')

When you try to retrieve the ligand-receptor interactions it returns a tuple of empty sets:

> cellphonedb.cellphonedb_ligands_receptors()
(set(), set())

This seems to be an issue in how the complex annotations were being imported, and therefore the ligand/receptor attributes were being all labeled as False, I think I fixed it in #279

Regarding the use of is_ppi flag, seems a bit more complex to implement (and I wouldn't want to break anything), so maybe we can discuss in person and I could try to take a look into it, or we can wait for @deeenes to come back :sweat_smile:

Since this should resolve your initial question, I'll close the issue and we can discuss the is_ppi thing later :)

Best

dbdimitrov commented 5 months ago

@Nic-Nic thanks Nico. Though I would say the is_ppi is crucial since there are now a lot of enzyme-enzyme interactions imported ad ligand-receptors 😅

dbdimitrov commented 5 months ago

I renamed the issue and reopened since the two comments are tied. The flag was introduced along with the update of the database. 🙂

dbdimitrov commented 5 months ago

PS. Also, there is no need to implement the flag, it's simply about setting it to False, when whe resource is obtained. We don't want to include those, and I can think of limited use of having them even if we do.

Nic-Nic commented 5 months ago

Added the flag to the import method of the interactions database from CellPhoneDB (see #281). The decision on whether to filter out the False ones or not, is more for @deeenes to take :sweat_smile: Since the flag is now there (once the PR is merged), you can easily then apply the filter in your code if you deem it necessary :)

dbdimitrov commented 5 months ago

Hey Nico, thanks a lot.

I think it should definitely be False to default, or at least the clients should have it as false if possible - though that might be more work.

In short, they assume that the last production enzyme of a metabolite in one cell type, and a receptor/enzyme of another translate to the metabolite-receptor interaction. I think it's very specific to be pull by default as ligand-receptor interactions by the clients :)

dbdimitrov commented 5 months ago

Hey @deeenes @Nic-Nic,

It seems to me that the solution we discussed yesterday for liana, i.e. access the databases via the client, will not work if we don't filter the non-ppis here.

These non-ppis are either way incorporated into MetalinksDB, so for our usecases we don't need them.

So, I'm re-opening the issue. Let me know if you want me to add the line that the dataframe.

Daniel

deeenes commented 5 months ago

Hey @dbdimitrov, you're right, having the attribute itself doesn't result in the removal of those interactions. We need two little things:

1) This is one of the few tasks that belongs to the scope of integration (between OmniPath & LIANA), so there should be one line either in LIANA or in omnipath Python that makes sure is_ppi=True is removed; 2) In the OmniPath network dataset definitions, the is_ppi interactions should go into a separate dataset, definitely not to the ligand-receptor one (this makes the prev. point redundant, but better to be safe, it doesn't cost anything)

We'll soon take care of these

dbdimitrov commented 4 months ago

Ping @deeenes, it will become time sensitive very soon :smile:

dbdimitrov commented 4 months ago

@deeenes :eyes: