saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
156 stars 21 forks source link

Cellphone DB (and other database versions) #60

Open SimonDMurray opened 1 year ago

SimonDMurray commented 1 year ago

Hi,

Sorry if this information is available somewhere but I cannot find it. I would like to know what version of cellphoneDB (and other databsases) you have inside the liana-py package.

I am aware you import your databases from OmniPathR (unless I am mistaken) and have checked their version of cellphoneDB: https://github.com/saezlab/OmnipathR/issues/85

I wanted to double check that your version matches theirs and if not could it be updated?

Thanks, Simon

dbdimitrov commented 1 year ago

Hi @SimonDMurray,

Currently, the CellPhoneDB accessible via LIANA will be CellPhoneDBv2, as while it's imported via OmniPath, the database in LIANA+ is versioned independently. If you wish to use CellPhoneDBv4 - you can directly obtain it via OmniPath's python (https://github.com/saezlab/omnipath) or R API and then just feed it to LIANA as a dataframe.

I plan to update it also in LIANA, but this also includes some infrastructural changes which we're currently working on (https://github.com/saezlab/liana-py/issues/9).

dbdimitrov commented 10 months ago

Hi @SimonDMurray,

A bit delayed but I was checking the new CPDB resource and one can get it the following way:

import pandas as pd
import numpy as np

import requests
import io

# read csv from link
# https://github.com/ventolab/cellphonedb-data/blob/master/data/interaction_input.csv
resource = requests.get('https://raw.githubusercontent.com/ventolab/cellphonedb-data/master/data/interaction_input.csv').content
resource = io.StringIO(resource.decode('utf-8'))
resource = pd.read_csv(resource, sep=',')
# keep only PPIs
resource = resource[resource['is_ppi']][['interactors']]
# replace + with _
resource['interactors'] = resource['interactors'].apply(lambda x: x.replace('+', '_'))
# if interactors contains two '-' replace the first one with '&
resource['interactors'] = resource['interactors'].apply(lambda x: x.replace('-', '&', 1) if x.count('-') == 2 else x)
# split by - and expand
resource = resource['interactors'].str.split('-', expand=True)
# replace & with - in the first column
resource[0] = resource[0].apply(lambda x: x.replace('&', '-'))
resource.columns = ['ligand', 'receptor']

Then it's as simple as passing the resource to the resource parameter to any liana function you would like to use.