pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
173 stars 48 forks source link

colors in the plot function. #114

Closed MingBit closed 3 years ago

MingBit commented 3 years ago

Hallo!

nice work!

I am playing around stream with my data. I notice that the color was somehow either missed (with 'plotly=True') or mismatched (with 'plotly=False'). Figures attached. blue population and green population are mismatched.

image image image

It seems that the labels (leiden_name) are re-ordered but colors are the same order. Any idea how to fix it? :-) Thanks

huidongchen commented 3 years ago

Can you check the versions of matplotlib and seaborn?

I did notice the upgrades of these two seem to have broken some functions. A stopgap would be to downgrade matplotlib to 3.2.x (e.g. 3.2.2) and seaborn to 0.10.1.

MingBit commented 3 years ago

thanks for the fast response. I re-install matplotlib and seaborn Unfortunately, colors from plot_dimension_reduction are completely new generated. And colors from plot_flat_tree are still mismatched. :-(

image image

huidongchen commented 3 years ago

Sorry about the issue. Unfortunately given the current info, it's hard for me to debug. But I am happy to take a closer look if you could share with me a short script/notebook to reproduce the problem.

MingBit commented 3 years ago

Sure. I started with the raw scRNA-seq data which has been pre-processed and clustered by scanpy. As shown above, the leiden colors were automatically generated in the scanpy plotting function. Then I applied stream for cell trajectory analysis.

1) remove unknown cluster new_adata = adata[~adata.obs['leiden'].isin(['6']),:]

2) stream pipeline (I didn't re-normalize the data as it was normalized by scanpy. Also, scanpy-HVG were selected for dimensional reduction)

import stream as st
st.set_workdir(new_adata,'./stream_result')
st.cal_qc(new_adata,assay='rna')
st.plot_qc(new_adata,jitter=0.3,)

st.filter_cells(new_adata,min_n_features= 100)
st.filter_features(new_adata,min_n_cells = 5)

st.select_top_principal_components(new_adata,n_pc=10,first_pc=True)
st.dimension_reduction(new_adata,method='umap',feature='top_pcs',n_neighbors=500,n_jobs=4, n_components=10)

st.plot_dimension_reduction(new_adata, color=['leiden_name', 'Infection'],plotly=True,
                            n_components=2,show_graph=False,show_text=False)

st.seed_elastic_principal_graph(new_adata,n_clusters=10)
st.elastic_principal_graph(new_adata, epg_alpha=0.03,epg_mu=0.05,epg_lambda=0.01)
st.optimize_branching(new_adata)

st.plot_flat_tree(new_adata,color=['leiden_name'],plotly=True,
                  dist_scale=0.8, show_graph=True,show_text=True)
MingBit commented 3 years ago

I should have showed you the packages version. :-) image

huidongchen commented 3 years ago

Thanks for sharing the details. I was wondering if there is a dummy data that i can use to reproduce what you are seeing since using the tutorial datasets, I probably won't see any issues with the colors

MingBit commented 3 years ago

Sure! As the data has been published (10.1038/s41590-019-0403-4 ) already, I would like to share you the one processed by scanpy. Therefore you don't need to run it from the beginning.

Would you give me your email.. so I'll send the share link to you. :-)

huidongchen commented 3 years ago

If it's already published, then it might be better to post the link here. So others who might have the same issue can follow through this thread. :)

MingBit commented 3 years ago

Although i don't think the issue comes from the data, the link is here: https://doi.org/10.1038/s41590-019-0403-4.

huidongchen commented 3 years ago

I was hoping you could share with me a simple h5ad file or something that stores the input new_adata you used in STREAM and ready to import. I guess that will save both of us time.

MingBit commented 3 years ago

here is the processed adata: https://figshare.com/s/5acb1155a5df274783d2

huidongchen commented 3 years ago

Sorry about the delay. It has been quite hectic for a while.

I looked into the issue and it seems to work fine on my end. Plz see the attached screenshots. Screen Shot 2021-02-01 at 10 21 42 AM Screen Shot 2021-02-01 at 10 22 41 AM

One thing I would like to point out though is that the colors are initialized when you make static plots as opposed to interactive Plotly.

For Plotly plots, they will check if the colors are already stored. if not, it will generate colors on its own. For static plots, they will check if the colors are already stored. if not, it will generate colors on its own and store them

The current color palette is extracted from seaborn static plots. That's why it has to work this way. That being said, the suggested plotting order will be making statics plots first and then visualizing it in plotly later to get consistent colors.

But we will restructure it in STREAM v2.

MingBit commented 3 years ago

Hey! Thanks a lot. I should have explained the problem better. I meant, it doesn't match the colors from scanpy.

image image

Although I could also use st.plot_dimension_reduction for cluster illustration, I still suspect the colours were re-ordered in the wrong way. :-) image

huidongchen commented 3 years ago

It is not trivial to auto-generate a color palette that can suit different numbers of annotations and Scanpy and STREAM construct the color palettes in totally different ways.

So you should not expect the exactly same order of colors.

MingBit commented 3 years ago

The point I wanted to make is that the colors were not auto-generated but rather (mismatchly) taken from Scanpy. nevertheless, thanks again for giving a try. :-)

huidongchen commented 3 years ago

I totally understand what you meant. As I said, STREAM and Scanpy work differently internally re coloring. The coloring in scanpy is based on categorialized variable while in STREAM it is not.

If you follow STREAM tutorial and add the colors for different annotations properly (as the following steps), I am pretty sure you will get the same order of colors:

st.add_cell_labels(adata,file_name='./cell_label.tsv.gz')
st.add_cell_colors(adata,file_name='./cell_label_color.tsv.gz')
MingBit commented 3 years ago

Sure! Thanks a lot. you can close the issue. :-)