Closed khemlalnirmalkar closed 3 years ago
@khemlalnirmalkar, thanks for the question.
First of all, I just updated the dokdo.heatmap
method in the 1.11.0-dev
branch to accept pandas.DataFrame
(click here to see the documentation):
import dokdo
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
csv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table.csv'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
df = pd.read_csv(csv_file, index_col=0)
dokdo.heatmap(df,
metadata=metadata_file,
normalize='clr',
hue1='body-site')
plt.savefig('heatmap-1.png')
Example CSV file for creating input pandas.DataFrame
: table.csv
Next, if you want to group the samples but do not want to cluster them, you can set row_cluster=False
:
dokdo.heatmap(df,
metadata=metadata_file,
normalize='clr',
hue1='body-site',
row_cluster=False)
plt.savefig('heatmap-2.png')
I understand these are not exactly what you wanted, but I hope this is a good starting point. What do you think?
@sbslee , Yes, this is great. Thanks for updating the code. I hope we can rotate the plot too, like keeping the taxa on the right top and samples on the bottom. Heatmap, ComplexHeatmap and some other R & py packages allow annotation and splitting of the heatmap. I like dokdo and It would be great if Dokdo can allow similar changes, Thanks again, Khem
@khemlalnirmalkar,
If it's something that's not too difficult to implement, I'd be more than happy to make updates do Dokdo. So please keep letting me know :) That being said, I updated the dokdo.heatmap
method to allow flipping of the x and y axes:
import dokdo
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
csv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table.csv'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
df = pd.read_csv(csv_file, index_col=0)
dokdo.heatmap(df,
metadata=metadata_file,
flip=True,
normalize='clr')
plt.savefig('heatmap-3.png')
Next up is splitting a heatmap by the sample group. This one is trickier because each sample group will have a different clustering pattern because their microbiome composition will be different. In the example you provided, there are only 30 taxa labeled, which indicates to me that the clustering was performed earlier and you are just using the same order of those 30 taxa (I'm not sure how you got the exact order) for all four sample groups. Can you give me a link to the method's documentation so I can read about it? Or explain to me how you generated the example?
Hi @sbslee My apologies totally distracted by some other work. The abundance file I sent you, was generated through Kraken2 https://github.com/DerrickWood/kraken2/wiki/Manual As I know, there was no clustering method was applied. I just wanted to make a heatmap with my own order. Thanks
@khemlalnirmalkar,
Good news! Once I realized that some users like yourself don't need (or want) clustering of their samples and/or taxa, I decided to rename the heatmap
method to clustermap
(documentation) and devise a new method called heatmap
(documentation). This is implemented in the 1.11.0-dev
branch.
Below is a simple example:
import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file,
normalize='log10',
flip=True,
figsize=(10, 7))
plt.tight_layout()
We can display a heatmap for each sample group:
fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
figsize=(20, 8),
gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
gut_samples = ['L1S8', 'L1S57', 'L1S76', 'L1S105', 'L1S140', 'L1S208', 'L1S257', 'L1S281']
left_palm_samples = ['L2S155', 'L2S175', 'L2S204', 'L2S222', 'L2S240', 'L2S309', 'L2S357', 'L2S382']
right_palm_samples = ['L3S242', 'L3S294', 'L3S313', 'L3S341', 'L3S360', 'L3S378', 'L4S63', 'L4S112', 'L4S137']
tongue_sampels = ['L5S104', 'L5S155', 'L5S174', 'L5S203', 'L5S222', 'L5S240', 'L6S20', 'L6S68', 'L6S93']
kwargs = dict(normalize='log10',
flip=True,
linewidths=0.5,
xticklabels=True)
dokdo.heatmap(qza_file, ax=ax1, samples=gut_samples, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, samples=left_palm_samples, yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, samples=right_palm_samples, yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, samples=tongue_sampels, yticklabels=False, cbar_ax=ax5, **kwargs)
ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')
plt.tight_layout()
Let me know what you think. I will probably make some additional updates to this method before releasing the official 1.11.0
version.
@sbslee, This is great, thank you so much. Quick question, can we group the samples based on category (like bar plot has an option "group order") instead of using each sample ID? can we normalize the abundance by z-score too? Once again, I appreciate your constant help,
Khem
@khemlalnirmalkar,
Those are great questions. I updated the heatmap
method to 1) support the Z score normalization and 2) accept the metadata
and where
arguments:
import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
figsize=(20, 8),
gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})
kwargs = dict(normalize='zscore',
flip=True,
linewidths=0.5,
metadata=metadata_file,
xticklabels=True)
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file, ax=ax1, where="[body-site] IN ('gut')", cbar=False, yticklabels=True, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, where="[body-site] IN ('left palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, where="[body-site] IN ('right palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, where="[body-site] IN ('tongue')", yticklabels=False, cbar_ax=ax5, **kwargs)
ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')
plt.tight_layout()
plt.savefig('heatmap-updated.png')
P.S. AFAIK, microbiome relative abundance data will give best results when normalized with log-based transformation such as log10 and CLR. This is because those data often contain lots of zeros and also extremely large values; therefore, the Z score transformation, which is linear and not log, may not be suitable for this kind of data. Please use this option carefully.
@sbslee Thanks for the quick action. Agree, log10 or CLR would be better for microbial abundance. I was thinking to use Z score for metabolites data :). Thank you so much for everything, Khem
@sbslee, I have a question on heatmap code When i am splitting the heatmap with categories, i am losing the sample order, see the attached image. first, start with MS117 but when split, it changes. for paired samples, its important to keep the order. how can i fix it?
Dokdo needs "s__" for taxa, but is it possible to avoid it? i am thinking to use for metabolites too.
Thanks,
@khemlalnirmalkar,
As for the s__
prefix, you don't have to worry about that for the heatmap
and clustermap
methods. That concern is only for the taxa_abundance_bar_plot
method. For example, if you look at the very first plot I posted in this thread, you will see that I provided a CSV file with taxa names that do not have any prefix.
If it's truly important to keep the exact order of samples within each group level, you should definitely use the samples
argument instead of the where
argument. The latter uses QIIME 2 API which returns filtered sample IDs in a set
object which erases the original order of samples. Assuming your input data is pandas.DataFrame
and your samples (rows) are already perfectly ordered, you can simply do something like:
dokdo.heatmap(df, ax=ax1, ..., samples=df.index[0:5])
dokdo.heatmap(df, ax=ax2 ..., samples=df.index[0:2])
dokdo.heatmap(df, ax=ax3 ..., samples=df.index[0:18])
...
Hope this helps. Let me know if you need more help.
@khemlalnirmalkar,
I thought about this more, and I think the following might help in your case. Basically, I added the sort_samples
argument in the heatmap
method:
import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file,
normalize='log10',
flip=True,
figsize=(10, 7))
plt.tight_layout()
plt.savefig('heatmap-1.png')
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
figsize=(20, 8),
gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})
kwargs = dict(normalize='log10',
flip=True,
linewidths=0.5,
sort_samples=True,
metadata=metadata_file,
xticklabels=True)
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file, ax=ax1, where="[body-site] IN ('gut')", cbar=False, yticklabels=True, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, where="[body-site] IN ('left palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, where="[body-site] IN ('right palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, where="[body-site] IN ('tongue')", yticklabels=False, cbar_ax=ax5, **kwargs)
ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')
plt.tight_layout()
plt.savefig('heatmap-2.png')
Let me know what you think.
@sbslee Awesome, Thank you so much for the update :) I am sorry for keep asking, now i need one more suggestion, If you see this image, for Prevotella, the abundance level is very low. if i want to avoid making a separate heatmap for Prevotella, Is it possible to stretch more the scale from 0-0.1? or increase the color range for scale 0-0.1.
Thanks again,
@khemlalnirmalkar,
No worries, you are more than welcome to ask as many questions as you want!
Generally speaking, the dokdo.heatmap
method uses the seaborn.heatmap
method, and you can provide vmin
and vmax
arguments to both methods to play with scaling of the colormap. Those arguments weren't explicitly exposed to the dokdo.heatmap
method before (you were still able to provide them as **kwargs
), but I just updated so that they are now.
import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))
dokdo.heatmap(qza_file,
ax=ax1,
normalize='log10',
flip=True,
pretty_taxa=True,
figsize=(10, 7))
dokdo.heatmap(qza_file,
ax=ax2,
normalize='log10',
flip=True,
pretty_taxa=True,
vmin=-2,
figsize=(10, 7))
plt.tight_layout()
plt.savefig('heatmap-1.png')
That being said, I'm getting the impression that you want to adjust the coloring scale just for selected species (e.g. Prevotella species), and if that's the case, then it's not possible with the dokdo.heatmap
method. In the heatmap, once you apply the normalization (e.g. log10 or CLR) then everything becomes relative to one another.
You could selectively perform some sort of scaling in your input pandas.DataFrame
and then use the normalize=None
setting for the heatmap, but I don't recommend that either because I think it could be misleading to the readers.
I think the best bet is drawing another heatmap for Prevotella species, as you mentioned, but I personally find your plot above perfectly acceptable, so I wouldn't even bother drawing another plot. But that's just my take! Feel free to ask any follow-up questions.
@sbslee earlier, I did try with vmin and vmix but was not helpful, I will give it a try with a separate plot and will see, how it looks. Otherwise just one plot. Thanks for your help,
@khemlalnirmalkar,
Sounds good. One thing to note is that if you are going to plot a separate figure just for Prevotella species, you may have to create a pandas.DataFrame
containing only those species. Otherwise, your normalization will still be affected by the presence of other extremely high or low species. Hope it works.
@sbslee Agreed..yes, the plan is use panda.dataframe only for Prevotella sps. Thanks for the suggestion :)
Hi @sbslee , Is it possible to make a heatmap using a panda dataframe (not qza)? i am trying to replicate the same style like bar plot, splitted categories with and without dendrogram. Please can you help?
like this image
Thanks, Khem