lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.36k stars 799 forks source link

umap.plot.connectivity issue with plotting colours for categories #362

Open amjass12 opened 4 years ago

amjass12 commented 4 years ago

Hello,

I am having some trouble plotting a umap with different colours for categories. I am using the umap.plot.connectivity as I am plotting with hammer for edge bundling:

My code to generate the plot is as follows:

`meta = pd.read_csv("metadata.csv")

mapper = umap.UMAP().fit(counts.transpose())

plot points

x=umap.plot.points(mapper, labels=meta["Organ"])

please note plot.points DOES colour all organs appropriately

not when i plot connectivity

with connectivity

umap.plot.connectivity(mapper, edge_bundling="hammer", cmap="rainbow", )`

This produces the plot attached to this post

hammer distance.pdf

I have different combinations of patterns in the arguments including edge_cmap, show_points, color_key and color_key_cmap with no success!

Any help would be much appreciated!!

Additionally, are there ways of further customising the plot, add title, remove borders etc.. I am having trouble passing this object to matplotlib arguments downstream of this.

many thanks!

lmcinnes commented 4 years ago

I think the catch is that you are not passing in the labels to the connectivity plot and telling to show the points. Something more like this would work:

meta = pd.read_csv("metadata.csv")

mapper = umap.UMAP().fit(counts.transpose())
umap.plot.connectivity(
    mapper, 
    edge_bundling="hammer",
    cmap="rainbow",
    labels=meta["Organ"],
    show_points=True,
 )
amjass12 commented 4 years ago

Hi @lmcinnes ,

thank you so much for the really fast response! I forgot to mention that i had also tried this. now this does actually colour points, however, not in a clear fashion: I am attaching a umap.plot,points output to show the grouping in the UMAP (distinct islands on the umap that belong to individual organs), however this is still not refelcting clearly in the .connectivity plot (and the legend also does not show):

Both plots attached. it also seems like the colours are muddled or mixed in the .connectivity plot? Is there a way of making the dots largers, passing pther arguments such as border size (or remove border) etc? please note, i also tried the edge_cmap argument concurrently which does not help, it looks like pseudo-color effect and not distinctly by group...

many thanks!

umap.plot.connectivity WITH plot_point.pdf umap.plot.points.pdf

lmcinnes commented 4 years ago

I see the problem. The catch is that the connectivity graph is geared to give smaller points so you can see the connections among them in a cluster. I don't think you can override this by default. What you can do is keep the matplotlib axis object and write to that. Something like this might work:

meta = pd.read_csv("metadata.csv")

mapper = umap.UMAP().fit(counts.transpose())
ax = umap.plot.connectivity(
    mapper, 
    edge_bundling="hammer",
 )
ax.scatter(*mapper.embedding_.T, s=5, c=meta["Organ"], cmap="rainbow")

where you can vary the value of the s= parameter to change the size of the points. It's possible that this won't work in meta["Organ"] is not numeric data. You can get around that by doing:

meta = pd.read_csv("metadata.csv")

mapper = umap.UMAP().fit(counts.transpose())
ax = umap.plot.connectivity(
    mapper, 
    edge_bundling="hammer",
 )
unique_labels = np.unique(meta["Organ"])
num_labels = unique_labels.shape[0]
color_map = plt.get_cmap("rainbow")(np.linspace(0, 1, num_labels))
new_color_key = {k: color_key[i] for i, k in enumerate(unique_labels)}
legend_elements = [
    Patch(facecolor=color_map[i], label=k)
    for i, k in enumerate(unique_labels)
]
colors = pd.Series(labels).map(new_color_key)
ax.scatter(*mapper.embedding_.T, s=5, c=colors)
ax.legend(handles=legend_elements)

Sorry that there isn't an easy built in solution at this time.

amjass12 commented 4 years ago

Hi @lmcinnes ,

very very sorry for the late reply! thank you so much for your response, it makes perfect sense and worked wonderfully! is there any documentation anywhere about the explanation of the distance lines?

I looked at datashader and also on the umap-learn documentation and it isn't very clear to me .. thank you once again!