sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
42 stars 12 forks source link

[taxa_abundance_box_plot] Deletion of notation of species not observed in samples #55

Closed yonghyun09 closed 1 year ago

yonghyun09 commented 1 year ago

@sbslee

Hello Steven Lee,

Thank you for your kind answers to my persistent questions! Your answers are very helpful for visualization analysis. I have a question regarding taxa box plot visualization.

Of the total of 28 samples I analyzed by NGS, only 4 samples were classified as Salmonella. So, it was observed in only 4 samples in the 'taxa bar plot'.

However, during the process of marking with a 'box plot', as shown below, i observed that boxes were marked at a little more than 0% of the relative abundance standard in all other samples. As for other species, there was a problem that all samples were marked on the box plot even though they existed only in some samples. In this regard, is there a way to adjust the box so that only the existing samples appear?

I referred to the Parameters of the API docs, but it was difficult, so I would appreciate it if you could tell me which code to use.

Thank you for all your assistance.

taxa_names = ['Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella',]

ax=dokdo.taxa_abundance_box_plot(
    qzv_file,
    level=6,
    hue='sample-id',
    hue_order=['Sample-1', 'Sample-2', 'Sample-3', 'Sample-4', 'Sample-5', 'Sample-6', 'Sample-7', 'Sample-8', 'Sample-9', 
               'Sample-10', 'Sample-11', 'Sample-12', 'Sample-13', 'Sample-14', 'Sample-15', 'Sample-16', 'Sample-17', 'Sample-18', 
               'Sample-19', 'Sample-20', 'Sample-21', 'Sample-22', 'Sample-23', 'Sample-24', 'Sample-25', 'Sample-26', 'Sample-27', 
               'Sample-28'],
    taxa_names=taxa_names,
    show_others=False,
    pretty_taxa=True,
    pseudocount=True,
    palette='flare',
    figsize=(10, 7),
)

plt.legend(bbox_to_anchor=(1,1))
plt.tight_layout()

1

sbslee commented 1 year ago

@yonghyun09,

This is because you used the pseudocount=True option, which adds a pseudocount of 1 to every bacteria to every sample so that the feature table doesn't have any zeros. This option is useful when your intention is to plot the y-axis in log scale (i.e. log of 0 is undefined). In your case, the y-axis is not log, so there is no need to add pseudocount. Below is what happens when you remove the option:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

taxa_names = ['Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella',]
hue_order = [
    'Sample-1', 'Sample-2', 'Sample-3', 'Sample-4', 'Sample-5', 'Sample-6', 'Sample-7', 'Sample-8', 'Sample-9', 
    'Sample-10', 'Sample-11', 'Sample-12', 'Sample-13', 'Sample-14', 'Sample-15', 'Sample-16', 'Sample-17', 'Sample-18', 
    'Sample-19', 'Sample-20', 'Sample-21', 'Sample-22', 'Sample-23', 'Sample-24', 'Sample-25', 'Sample-26', 'Sample-27', 
    'Sample-28'
]

qzv_file = 'taxa-bar-plots.qzv'

fig, ax = plt.subplots(figsize=(15, 10))

dokdo.taxa_abundance_box_plot(
    qzv_file,
    level=6,
    taxa_names=taxa_names,
    pretty_taxa=True,
    show_others=False,
    hue='sample-id',
    hue_order=hue_order,
    ax=ax,
)

plt.tight_layout()
plt.savefig('test.png')

test

yonghyun09 commented 1 year ago

@sbslee

Thank you very much. It was right under one's nose, but I did not recognize it. 😂 Thank you for your kind explanation!