mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.41k stars 1.91k forks source link

clustermap colcolor can't show correctly #1173

Closed zhaouu closed 7 years ago

zhaouu commented 7 years ago

My clustermap can't show col_colors correctly:

g = sns.clustermap(data,col_cluster=False,metric ='correlation',cmap=cmap,col_colors=colors,linewidths = 0,yticklabels=False,xticklabels=False,robust=True)
In [418]: colors.value_counts()
Out[418]: 
white      5771
#a7d854     119
#8da0cb      40
#e68ac3      40
#66c2a5      40
#fa8e63      17
Name: , dtype: int64

In [426]: data.T.head()
Out[426]: 
       60617173  60617274  60617317  60617384  60617447  60617461  60617588  \
X10           2         2         2         2         2         2         2   
X100          2         2         2         2         2         2         2   
X1000         2         2         2         2         2         2         2   
X1002         2         2         2         2         2         2         2   
X1004         2         2         2         2         2         2         2   

       60617599  60617640  60617647    ...     61015393  61015454  61015467  \
X10           2         2         2    ...            2         2         2   
X100          2         2         2    ...            2         2         2   
X1000         2         2         2    ...            0         0         0   
X1002         2         2         2    ...            2         2         2   
X1004         2         2         2    ...            2         2         2   

       61015553  61015733  61015943  61015995  61016070  61016092  61016591  
X10           2         2         2         2         2         2         2  
X100          2         2         2         2         2         2         2  
X1000         0         0         0         0         0         0         0  
X1002         2         2         2         2         2         2         2  
X1004         2         2         2         2         2         2         2  

[5 rows x 6027 columns]

but my plot is like this: image

mwaskom commented 7 years ago

I don't know what your data should look like, so just saying it's "not showing correctly" and then adding a screenshot does not help me diagnose the problem. What exactly is the issue?

zhaouu commented 7 years ago

Thanks for your reply. I used matplotlib plotted the color list. It showed correctly. My command line is:

In [471]: plt.vlines(range(len(colors)),ymin=0,ymax=1,color=colors)
In [472]: plt.show()

image ··· In [473]: colors.head() Out[473]: 60617173 white 60617274 white 60617317 white 60617384 white 60617447 white Name: , dtype: object ··· You can see the background color is white, and I marked some region. But in the heatmap, the background color become to orange, and marked region can't show completely. Is this because the columns is very large(In my data is 6027 columns)? I think it is a bug of clustermap. Hope for your reply.

mwaskom commented 7 years ago

I'm sorry but I cannot help you without a reproducible example.

zhaouu commented 7 years ago

Here is my test data, Thank you very much!

data = dd.T.head()
f= open('test_data.pickle','wb')
pickle.dump([data, colors], f)
zip test_data.zip test_data.pickle

test_data.zip

mwaskom commented 7 years ago

Please do not share data as a pickle. It is insecure. If I were to load your dataset, it could execute arbitrary code on my system. The common format for sharing Pandas DataFrames is .csv.

zhaouu commented 7 years ago

OK, I converted it to csv fomat. test_data.zip

mwaskom commented 7 years ago

Your problem is using the keyword robust, which is interfering with the assignment of colors:

df = pd.read_csv("test_data.csv", index_col=0)
df.columns = df.columns.astype(int)
colors = pd.Series.from_csv("colors.csv")

g = sns.clustermap(df, col_cluster=False,
                   metric='correlation',
                   col_colors=colors, linewidths=0,
                   yticklabels=False, xticklabels=False,
                   robust=False)

image

As a note for future interactions with open source projects, it took me quite a bit of time to reconstruct your issue once you shared the data. In the future you should share a self-contained, reproducible example of the problem. That would be a script that I could copy and paste and immediately see the plot that you have. If you are asking people for free technical support, you should put in the effort to make it easy as possible for them to help you.