sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
43 stars 12 forks source link

Heatmap with prettified names? #24

Closed mishkb closed 3 years ago

mishkb commented 3 years ago

Hi @sbslee , Thanks for your very recent updates to heatmap (from clustermap) - just in time for what I need. Just wondering how I could incorporate your 'prettified names' function - to get taxa names as the y-axis labels such as is possible in the legend of the taxa_abundance_bar_plot output?

Thanks, Michelle

sbslee commented 3 years ago

@mishkb,

Thanks for the great suggestion! I just added the pretty_taxa argument to the heatmap method in the 1.11.0-dev branch (the official release of 1.11.0 will be made within 10 days; I aim to make a Dokdo release every month). When you set pretty_taxa=True it will display only the smallest taxonomic rank like how it's done in the taxa_abundance_bar_plot method with the legend_short argument:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              ax=ax1)

dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              ax=ax2)

plt.tight_layout()
plt.savefig('heatmap.png')

heatmap

Is this what you had in your mind? Let me know if you need more help.

mishkb commented 3 years ago

This is wonderful - thanks for your prompt response. Based on your figures, this is just what I was looking for. I'll give it a whirl today.

mishkb commented 3 years ago

OK @sbslee ,I've been able to implement that new function - thank you! Now I'm wondering...

  1. is possible to change the x-label to another label from the metadata file (e.g. at the moment the heatmap uses the first column as the label [#Sample_ID or similar], but I'd like to use an alternate column i.e. Sample_name.
  2. Is it possible to control the number of taxa that are plotted in the heatmap? I'd really only like to visualize to top taxa (20-50 say), as is possible for the taxa_abundance_bar_plot

Michelle

sbslee commented 3 years ago

@mishkb,

Thanks for the great ideas. I think I can implement both fairly easily, but let's take one step at a time. I just added the label_columns parameter to the dokdo.heatmap method. It works the same way as it does for the dokdo. taxa_abundance_bar_plot method.

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'

dokdo.heatmap(qza_file,
              metadata=metadata_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              ax=ax1)

dokdo.heatmap(qza_file,
              metadata=metadata_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              label_columns=['sample-id', 'body-site'],
              ax=ax2)

plt.tight_layout()

heatmap

If you are happy with above, we can move on to the next (i.e. showing top taxa). Let me know what you think.

mishkb commented 3 years ago

Works exactly as I was hoping - thanks @sbslee! Does the heatmap function automatically align labels vertically or horizontally depending on the figure size/space available? (I think I have to look more at the plt.tight_layout() function and matplotlib.

sbslee commented 3 years ago

@mishkb,

Glad to hear that works for you. As for the rotation of tick labels, AFAIK, it is automatically determined by the seaborn.heatmap method (which is used by dokdo.heatmap). BTW, you can easily change the rotation of tick labels in the following way:

ax = dokdo.heatmap(...)

for ticklabel in ax.get_xticklabels():
    ticklabel.set_rotation(90)

It sounds like we are good to move on to the next task: showing only the top taxa. Stay tuned.

mishkb commented 3 years ago

@sbslee I have just realised the 'prettified names" function actually truncates species names to just show the sname, not gGenus; s__species name. I just noticed this in the taxa_abundance_bar_plot plot and obviously, it also is the same now in heatmaps if I use that option.

Is there a way to return the last two ranks - so in your example,

>>> a = 'd__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Schaalia_radingae'
>>> print(dokdo.pname(a))
g__Actinomyces;s__Schaalia_radingae

I am using taxonomy from NCBI that separates the genus names and species names - so with the 'prettified names', it is impossible to tell which genus the species come from.

(BTW, I can make this a separate issue if that is better).

sbslee commented 3 years ago

@mishkb,

No worries about opening another issue. And thanks for the great suggestion. I myself have come across that issue before. The good news is I just updated the pname method to allow returning of more than one taxonomic rank:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

a = 'd__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Schaalia_radingae'
dokdo.pname(a)

Will produce the usual:

's__Schaalia_radingae'

However, if you provide the levels option:

a = 'd__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Schaalia_radingae'
dokdo.pname(a, levels=[6, 7])

It will give what you want:

'g__Actinomyces;s__Schaalia_radingae'

I extended this to the heatmap and taxa_abundance_bar_plot methods by adding the pname_kws argument to both methods:

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              ax=ax1)

dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              pname_kws=dict(levels=[2, 3]),
              ax=ax2)

plt.tight_layout()

heatmap

qzv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/taxa-bar-plots.qzv'
dokdo.taxa_abundance_bar_plot(qzv_file,
                              figsize=(10, 7),
                              level=6,
                              count=8,
                              legend_short=True,
                              pname_kws=dict(levels=[5, 6]),
                              artist_kwargs=dict(show_legend=True,
                                                 legend_loc='upper left'))
plt.tight_layout()

barplot

Updated documentations:

pname: https://dokdo.readthedocs.io/en/1.11.0-dev/dokdo_api.html#module-dokdo.api.pname heatmap: https://dokdo.readthedocs.io/en/1.11.0-dev/dokdo_api.html#heatmap taxa_abundance_bar_plot: https://dokdo.readthedocs.io/en/1.11.0-dev/dokdo_api.html#taxa-abundance-bar-plot

Does this work for you?

mishkb commented 3 years ago

That looks great to me @sbslee - I hope yourself and others will find it useful as well. I really appreciate you making these adjustments and answering my questions.

sbslee commented 3 years ago

@mishkb,

No problem! I also added the count argument to the heatmap method; it's similar to the count argument in the taxa_abundance_bar_plot method:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'

dokdo.heatmap(qza_file,
              metadata=metadata_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              ax=ax1)

dokdo.heatmap(qza_file,
              metadata=metadata_file,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              count=20,
              ax=ax2)

plt.tight_layout()

heatmap

Let me know if this is what you wanted.

mishkb commented 3 years ago

Fantastic @sbslee! This has worked perfectly with my data. Thank you again for adding those options to your code. I'm so grateful that I found Dokdo and that you are still actively working on it.