mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.43k stars 1.91k forks source link

Heatmaps? #41

Closed olgabot closed 10 years ago

olgabot commented 10 years ago

Hello there,

I'm working on a heatmap PR for pandas (https://github.com/pydata/pandas/pull/5646) but it's been suggested that all visualizations be worked on in separate packages. Since seaborn already supports pandas internally and does a lot of the "run some algorithm and then show me the result" kind of stuff (violinplot, kde fitting, linear fitting, etc)

How do you think this fits with seaborn?

Olga

PS I made prettyplotlib which is a small matplotlib wrapper and I'm down to merge efforts but only if the nogrid also automatically despines the top and right axes. :)

PPS I'm also working on a PR for seaborn to accept a bw_method kwarg for violin because I need narrower bandwidths for my research.

PPPS THANK YOU for making paper/poster/notebook/talk contexts. Seriously one of the best things ever.

mwaskom commented 10 years ago

Hey, this looks really cool! I'm not going to get a chance to give it a closer look until tonight (busy day of actual obligations... :/), but definitely seems like something that would be great to have. More soon.

mwaskom commented 10 years ago

OK sorry for the delay. This looks like a cool function, and I agree it sounds like it would fit in nicely to seaborn. But I have to admit...I'm not sure I understand what it's showing? It's not a visualization I've run into before in my field. But those are exactly the kind of contributions I'd welcome!

I don't think heatmap is the best name for this kind of plot, though? It looks like it's doing something much more complicated than just mapping values to colors in a matrix -- originally when I aw the title of this PR I expected it to be something different. I gather that's the name of the function in R, although I'm not sure "follow what R does" is the best policy in general for naming things :)

mwaskom commented 10 years ago

PS, Re nogrid spines: is there a way to set that by default? I couldn't find anything about it in the rcParams. Or do you mean just call despine() within all of the plotting functions?

olgabot commented 10 years ago

This heatmap is also called a clustergram or clustered heatmap. The way I use them is to look at gene expression or splicing across many genes (30-50k) and samples (100-200)

Currently, there's no way to control spines with rcParams so it would have to be calling despine after all the plotting functions. That is, until I figure out how to add an rcparam :)

mwaskom commented 10 years ago

Ah I figured it was some kind of gene thing. What are the dendrogams showing? I guess I could just look at the code :)

I think an upstream PR for "Allow axis spines to be configurable" in matplotlib would curry lots of support.

mwaskom commented 10 years ago

In terms of names, for seaborn I'd like to to have every function that draws something be called <something>plot (note issue #34 keeping track of the fact that I am going to rename violin for consistency). clustergramplot seems too wordy though. Hmm.

olgabot commented 10 years ago

clusterplot maybe?

olgabot commented 10 years ago

The dendrograms are showing the (pairwise) hierarchical clustering of associations between genes, which is really useful for finding sub-modules of expression of certain genes. I look at these things all day :)

A major TODO for this is to implement optimal leaf ordering which correctly orders the dendrogram leaf nodes after clustering. I wouldn't consider this finished until this is done.

mwaskom commented 10 years ago

Hm very cool. I wonder if this kind of thing would be useful for fMRI data.

In terms of the algorithm to support it, is that something you'd imagine being wrapped in with the plotting interface? Or is it otherwise useful? When seaborn was just a personal project the way I had kept things organized was

(these are actually part of a broader ecosystem of packages I use for my research)

I'm starting to think this makes less sense, though, because it's annoying to have a separate package with core functions that seaborn depends on, especially if people are going to be contributing things that rely on algorithms I don't fully understand. So maybe going forward something like seaborn.algorithms would be better. Then again, I use. e.g., bootstrapping often outside the context of plotting, and it would be strange to be importing them from a plotting library.

Of course if you had plans to submit the algorithm to, e.g., scipy, this discussion could be punted for a while.

lbeltrame commented 10 years ago

Chiming in (I'm a pandas user with a few contributions and a day-to-day bioinformatician). I would decouple the algorithm from plotting, because you can visualize heatmaps using dendrograms generated from different algorithms (e.g. you may want to use "plain" algorithms, do some bootstrap resampling....).

@olgabot Somehow that publication slipped under my radar. Would be very nice to have in, indeed.

mwaskom commented 10 years ago

I'm going to close this as we have a WIP PR open on heatmaps (#73)