sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
43 stars 12 forks source link

Use of taxa_names in taxa_abundance_bar_plot #27

Open mishkb opened 3 years ago

mishkb commented 3 years ago

HI @sbslee ,

I'm trying to use the 'taxa_names' flag in 'taxa_abundance_bar_plot'. Without the 'taxa_names' I can show different taxa levels from my barplot.qzv file without issue. However, I would like to show just the class level from a specific taxa - e.g. show %abundance from classes from just k__Bacteria;_p__Proteobacteria.

I have tried a few different things with syntax etc (including putting the taxon='kBacteria;pProteobacteria' just below the file inputs), but get errors including:

KeyError: "['kBacteria;pProteobacteria'] not found in axis"

Could you please advise if what I am trying to achieve is possible? Or correct my syntax? I can upload the files if that is helpful.

Here what I'm running in Jupyter:

qzv_file = '/home/nano/Alert4_MVP/taxa-bar-plots-no-Unassigned.qzv'
metadata_file = '/home/nano/Alert4_MVP/TFP_C4BD_metadata.tsv'
taxon = 'k__Bacteria;p__Proteobacteria'

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(14, 7), gridspec_kw={'width_ratios': [9, 1]})
ax = dokdo.taxa_abundance_bar_plot(qzv_file,
                              metadata=metadata_file,     
                              ax=ax1,
                              level=3,
                              taxa_names=['k__Bacteria;p__Proteobacteria'], 
                             label_columns=['SampleName'],
                              cmap_name='tab20')

for ticklabel in ax.get_xticklabels():
    ticklabel.set_rotation(45)
ax.set_xlabel("Sample", fontsize = 12)  

dokdo.taxa_abundance_bar_plot(qzv_file,
                              ax=ax2,
                              level=3,
                              taxa_names=['k__Bacteria;p__Proteobacteria'],
                              cmap_name='tab20',
                              legend_short=True,
                              pname_kws=dict(levels=[3])
                              )

handles, labels = ax2.get_legend_handles_labels()
ax2.clear()
ax2.legend(handles[::-1], labels[::-1], loc='center left')
ax2.axis('off')

plt.tight_layout()

Thanks!

sbslee commented 3 years ago

@mishkb,

Good question! The issue was caused because you used level=3 (Kingdom-Phylum-Class) while specifying a Kingdom-Phylum name (k__Bacteria;p__Proteobacteria). Setting level=2 should fix the issue. Try it and let me know if it doesn't solve the problem.

mishkb commented 3 years ago

Thanks as always for your quick responses @sbslee! OK - I have just got that to work. What I was hoping for was being able to select the Kingdom-Phylum name (kBacteria;pProteobacteria) but display in the graph the level=3 (Kingdom-Phylum-Class) :-) But I can provide the class names to display easily enough.

sbslee commented 3 years ago

@mishkb,

Oh I see! That's an interesting idea. I agree that it'd be handy to be able to select all the downstream taxa by providing their parent taxon. If it's something that you foresee yourself doing a lot, I'd be more than happy to implement something that does that.

mishkb commented 3 years ago

I wasn't sure it would be possible given the underlying structure of the barplots.qzv file. But I would definitely use it, and find it easier that supplying a taxa _names list. Thanks for considering the idea.