saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
134 stars 15 forks source link

Frequency Chord diagram and Heatmap in Python #85

Open maximelepetit opened 5 months ago

maximelepetit commented 5 months ago

Hi ,

Thanks a lot for this useful tool !

I'm running Liana and i'm curious that you cannot provide methods to generate Chord diagram or heatmap as in the liana R wrapper.

Have you planned to add these methods ?

Best regards,

Maxime

dbdimitrov commented 5 months ago

Hi @maximelepetit,

Thanks for opening an issue! I've been considering implementing one of the two to liana-py.

I will do my best to have at least one implemented soon.

In the meantime, ChatGPT does an excellent job when generating network plots with networkx. And a plot with https://github.com/DingWB/PyComplexHeatmap should be relatively quick to make :)

Best wishes, Daniel

maximelepetit commented 5 months ago

Hi Daniel,

For Chords diagrams i found an easy and quickly way with pyCirclize : One example :

import pycirclize
from pycirclize import Circos
import pandas as pd
from pycirclize.parser import Matrix

## Extract liana_res
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
df = df1.pivot(index='source',columns='target',values='counts')

matrix_data=df.values
row_names = list(df.index)
col_names = row_names

matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)

# Initialize from matrix (Can also directly load tsv matrix file)
circos = Circos.initialize_from_matrix(
    matrix_df,
    space=3,
    cmap="tab10",
    r_lim=(93, 100),
    ticks_interval=500,
    label_kws=dict(r=94, size=12, color="white"),
    link_kws=dict(direction=1, ec="black", lw=0.5),
)

#print(matrix_df)
fig = circos.plotfig()

#print(matrix_df)
fig = circos.plotfig()

Give : chords1 Best ! Maxime

maximelepetit commented 5 months ago

You can highlight specific groups :

import pycirclize
from pycirclize import Circos
import pandas as pd
from pycirclize.parser import Matrix

# Extract liana_res
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
df = df1.pivot(index='source',columns='target',values='counts')

matrix_data=df.values
row_names = list(df.index)
col_names = row_names
matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)

# Define link_kws handler function to customize each link property
def link_kws_handler(from_label: str, to_label: str):
    ## Highlith Astro, CR, Microglia source
    if from_label in ('Astrocytes', 'CR', 'Microglia'):
        # Set alpha, zorder values higher than other links for highlighting
        return dict(alpha=0.5, zorder=1.0)
    else:
     return dict(alpha=0.1, zorder=1.0)

# Initialize from matrix (Can also directly load tsv matrix file)
circos = Circos.initialize_from_matrix(
    matrix_df,
    space=3,
    cmap="tab10",
    r_lim=(93, 100),
    ticks_interval=500,
    label_kws=dict(r=94, size=12, color="white"),
    link_kws=dict(direction=1, ec="black", lw=0.5),
    link_kws_handler=link_kws_handler,
)

#print(matrix_df)
fig = circos.plotfig()

Give : chord2

;)

maximelepetit commented 5 months ago

Alternatively you can generate Sankey plot through pySankey :

from pySankey.sankey import sankey
import pandas as pd
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
sankey(
    left=df1["source"], right=df1["target"], 
    leftWeight= df1["counts"], rightWeight=df1["counts"], fontsize=10
)

GIves : sankey

dbdimitrov commented 5 months ago

Hi @maximelepetit,

Thanks a lot for all your suggestions! I will pin the issue and will keep it open until I have some of those in place.

I plan to do a rehaul of liana's plotting functions, and I will make sure these are included also.

Daniel

brainfo commented 3 months ago

Hi, I feel like it's also good to have a hypothesis driven layout, showing the specific sender-ligand -- receptor-recepient pairs, with for example senders on one sphere and recepients on another and the color by any score, like the magnitude, or the significance, rank, lrmean, or change between conditions. I used to use ChordDiagram package in r. Maybe it's good to have this in the utils in liana+ py?

while so far for convenience with same information we can simply show bar/dot plots

and to compare the same pair in different conditions, which metric is recommended to be tested? combined rank or individual continous magnitude metric?

dbdimitrov commented 2 months ago

Hi @brainfo, thanks for the suggestion. I'm still working on refactoring the algorithmic side of liana, but once I'm done with that. I will refactor the functions and extend them with the suggestions here.

re comparing across conditions: there are a few tutorials, e.g. using differential expression analysis with liana and PyDeSeq2, or the tutorials with MOFA or Tensor-cell2cell.

I would suggest checking those 🙂 though you could compare most scores as long as you normalize such that your conditions are comparable prior to running liana.

Hope this helps.

brainfo commented 1 month ago

Hi @brainfo, thanks for the suggestion. I'm still working on refactoring the algorithmic side of liana, but once I'm done with that. I will refactor the functions and extend them with the suggestions here.

re comparing across conditions: there are a few tutorials, e.g. using differential expression analysis with liana and PyDeSeq2, or the tutorials with MOFA or Tensor-cell2cell.

I would suggest checking those 🙂 though you could compare most scores as long as you normalize such that your conditions are comparable prior to running liana.

Hope this helps.

Hi, I am aware of the scope of LIANA+ project in ligand receptor interaction imputation but not downstream usage of the lr results. I am basically doing network analysis on the Liana resulted dataframe (with customized secondary manipulation of differential expression analysis stats). Mostly using networkx and also customized visualizations:

  1. source and target as nodes, and one edge with aggregated attributes between nodes
  2. source and target as nodes (selected), multiple edges allowed with all or selected interactions with their attributes
  3. source-ligand complex and target-receptor complex as nodes, and with their nodes and edge attributes
  4. api/call back to networkx utilities for at least general node and edge stats for all these network outputs. And network visualization (node and attributes mapping).

I have most of the scripts for these and not sure if it's something relevant to this open issue.

dbdimitrov commented 1 month ago

Hi @brainfo,

TBH, I realize that I should provide a bit more visualisations, so suggestions are welcome. Please feel free to open an issue with your suggestions, and I will do my best to find time to implement them :)

brainfo commented 3 days ago

Hi @brainfo,

TBH, I realize that I should provide a bit more visualisations, so suggestions are welcome. Please feel free to open an issue with your suggestions, and I will do my best to find time to implement them :)

Hi, what I did is like,

  1. at the source-ligand level, calculate the out-degree: say tb is the table with the fields from liana:

    def save_cell_molecule_tb(tb, outfile):
    lr_ = pd.DataFrame(columns=['#node1', 'node2', 'combined_score'])
    ## join two columns of tb
    lr_['#node1'] = tb['ligand_complex'] + '-' + tb['source']
    lr_['node2'] = tb['receptor_complex'] + '-' + tb['target']
    lr_['combined_score'] = tb['interaction_score']
    lr_.to_csv(outfile, sep='\t', index=False)
    return lr_
  2. out-degree analysis using networkx and visualization with highlighted cell types

    def out_degree(cell_molecule_tb_file, high_celltype, low_celltype, plot_file, outdegree_file):
    lr_net = nx.read_edgelist(cell_molecule_tb_file,data=(('combined_score',float),), delimiter='\t', create_using=nx.DiGraph)
    degree_sequence = sorted((d for n, d in lr_net.out_degree(weight='combined_score')), reverse=True)
    ordered_dict = dict(sorted(lr_net.out_degree(weight='combined_score'), key=lambda item: item[1], reverse=True))
    node_keys = np.array(list(ordered_dict.keys()))
    
    color_sequence = np.where(np.core.defchararray.find(node_keys,high_celltype)!=-1, '#FBDB7D', np.where(np.core.defchararray.find(node_keys,low_celltype)!=-1, '#df65b0ff', 'darkgrey'))
    dmax = max(degree_sequence)
    
    fig = plt.figure("Degree of a random graph", figsize=(4, 2))
    # Create a gridspec for adding subplots of different sizes
    axgrid = fig.add_gridspec(2, 4)
    
    ax1 = fig.add_subplot(axgrid[:, :2])
    ax1.scatter(range(len(degree_sequence)), degree_sequence, c=color_sequence)
    ax1.set_title("Out-Degree Rank Plot")
    ax1.set_ylabel("Out-Degree")
    ax1.set_xlabel("Rank")
    ax1.axhline(y=0, linestyle='--', c='black')
    ax2 = fig.add_subplot(axgrid[:, 2:])
    ax2.bar(*np.unique(degree_sequence, return_counts=True))
    ax2.set_title("Out-Degree histogram")
    ax2.set_xlabel("Out-Degree")
    ax2.set_ylabel("# of Nodes")
    
    fig.tight_layout()
    plt.savefig(plot_file)
    
    out_degrees = {node:val for (node, val) in lr_net.out_degree(weight='combined_score')}
    
    out_degrees_df = pd.DataFrame.from_dict(out_degrees, orient='index', columns=['degree'])
    out_degrees_df.to_csv(outdegree_file, sep='\t')
    return out_degrees_df
  3. Visualize with source and target as nodes, using netgraph

    
    def preprare_data(ccc):
    ccc_cell = pd.DataFrame(columns=ccc.columns)
    ccc_cell['#node1'] = ccc['#node1'].str.split('-').str[-1]
    ccc_cell['node2'] = ccc['node2'].str.split('-').str[-1]
    ccc_cell['combined_score'] = ccc['combined_score']
    ccc_cell['weight'] = ccc_cell.groupby(['#node1', 'node2'])['combined_score'].transform('sum')
    
    unique_ccc_cell = ccc_cell.drop_duplicates(subset=['#node1', 'node2'])
    unique_ccc_cell.reset_index(inplace=True)
    unique_ccc_cell['weight'] = unique_ccc_cell['weight'].abs()
    return unique_ccc_cell

unique_cccc = preprare_data(cccc) G = nx.from_pandas_edgelist(unique_cccc,source='#node1' , target='node2', edge_attr="weight", create_using=nx.DiGraph())

edges,weights = zip(*nx.get_edge_attributes(G,'weight').items()) pos = nx.spring_layout(G, seed=404)

pip install netgraph

hex_dict = %your own hex dict for nodes

Graph(G, node_color=hex_dict, node_size= node_size, edge_cmap='Blues', edge_width={k: 2*v for k, v in nx.get_edge_attributes(G,'weight').items()}, arrows=True, node_labels=True, node_label_fontdict={'size':18})