sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
43 stars 12 forks source link

orders and group mismatch #18

Closed khemlalnirmalkar closed 3 years ago

khemlalnirmalkar commented 3 years ago

Hi @sbslee , I found a small mismatch during grouping and orders of bacteria. Probably the way my data is but thought maybe you will have a solution, instead of going and labeling each bacteria with colors. if you see both figures, one small difference is prevotella and lactobacillaceae. I wanted to keep Prevotella in both figures but couldn't control. I guess, its how they started grouping, in first I started with control/donors but the second one with samples.

scripts for the first figure:


fig, [ax1, ax2, ax3, ax4] = plt.subplots(1, 4, figsize=(16, 7), gridspec_kw={'width_ratios': [2, 1.40, 2.5, 2.5]})
kwargs = dict(level=6, count=13, sort_by_mean2=False)

qzv_file = '/media/scebmeta/raw_backup/DoD_fastqs/finalRun_analysis/3932480_Nirmalkar/part4/mergd_GrpAnB/taxa-bar-plots.qzv'
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax1,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8'],
                              legend_short=True,
                              artist_kwargs=dict(title='Finch Donors',title_fontsize=16, legend_loc='lower right', xticklabels_fontsize=14, yticklabels_fontsize=12, ylabel_fontsize=14)),

fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax2,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Donors_UMN1', 'Donors_UMN2'],
                              figsize=(10, 7),
                              legend_short=True,
                              artist_kwargs=dict(title='UMN Donors',title_fontsize=16,show_legend=False, legend_loc='lower right', xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30),
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax3,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Base_B_Finch', 'Vanco_B_Finch', 'End_01_B_Finch', 'End_02_B_Finch'],
                              figsize=(10, 7),
                              legend_short=True,
                              artist_kwargs=dict(title='GroupB: Finch recepients',title_fontsize=16, xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30),
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax4,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Base_B_UMN', 'Vanco_B_UMN', 'End_01_B_UMN', 'End_02_B_UMN'],
                              legend_short=True,
                              artist_kwargs=dict(title='GroupB: UMN recepients',title_fontsize=16, xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
plt.legend(bbox_to_anchor=(1, 1), loc=2, fontsize=14,facecolor='white')
plt.tight_layout()
#plt.xticks(fontsize=13)
#plt.yticks(fontsize=13)
#set_xlabel("", fontsize=25, fontweight='bold')
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
fig1.savefig('GrpB_DONsepGenus_barpltT13.svg')
fig1.savefig('GrpB_DONsepGenus_barpltT13.png', dpi=500)

image

*** for the second figure

qzv_file = '/media/scebmeta/raw_backup/DoD_fastqs/finalRun_analysis/3932480_Nirmalkar/part4/mergd_GrpAnB/taxa-bar-plots.qzv'
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              #ax=ax1,
                              count=13,
                              figsize=(50,22),
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              by=['order'],
                              label_columns=['order','DonNew'],
                              include_samples={'DonNew':['Donors_UMN1', 'Donors_UMN2', 'Base_A_UMN','Vanco_A_UMN', 'End_01_A_UMN','End_02_A_UMN']},
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              #orders={'DonNew':['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8','Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch']},
                              #orders={'DonNew':['Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch','Base_B_Finch','Vanco_B_Finch', 'End_01_B_Finch','End_02_B_Finch']},
                              #group='DonNew',
                              #group_order=['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8','Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch'],
                              legend_short=True,
                              artist_kwargs=dict(title='', legend_loc='lower right', xticklabels_fontsize=34, yticklabels_fontsize=48, ylabel_fontsize=48)),

plt.legend(bbox_to_anchor=(1, 1),fontsize=45, loc=2, facecolor='white')
plt.tight_layout()
#plt.xticks(fontsize=13)
#plt.yticks(fontsize=13)
#set_xlabel("", fontsize=25, fontweight='bold')
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
fig1.savefig('GrpA_UMN_genus_barpltT13.svg')
fig1.savefig('GrpA_UMN_genus_barpltT13.png', dpi=500)

image

Note: Prevotella is more abundant in bar 1 and 2 (figure 1st) and last two bars (figure2) and they are my controls...so i want to keep.

Any suggestions?

sbslee commented 3 years ago

@khemlalnirmalkar,

Hmmm. The fix I made yesterday regarding the group_order option (#17) shouldn't allow this to happen. Are you sure you reinstalled the latest version of Dokdo (1.10.0-dev)? Also, can you send me your .qzv file so I can test from my end? If you can't, no worries.

khemlalnirmalkar commented 3 years ago

Yes, re-installed and i guess its working well. i cross checked with different sub-set of data using specific grouping/category. If you see both code, in one i used group_order "DonNew" but in the second one no grouping. i am not sure, if that is the reason. Reason of making second plot is to see the abundance of bacteria in a specific time point/longitudinally. this is the step i am getting stuck with one bacteria. i wanted to keep my donors/controls at the first two bars where prevotella is high. Please can you share email and i can sent it you qzv? Sorry, I cant share it here, I just saw your email in profile page. i am sending it there in 10 min.

khemlalnirmalkar commented 3 years ago

sent....

sbslee commented 3 years ago

I got your file. Will get back to ASAP!

khemlalnirmalkar commented 3 years ago

Okay, thank you!

sbslee commented 3 years ago

@khemlalnirmalkar,

Thank you for your patience. This was indeed a bug with the group option (sorry, still getting used to having this option). Below are before and after the fix:

Before:

original

After:

fixed

Code used:

import dokdo

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set()

import numpy as np
np.random.seed(1)

qzv_file = 'taxa-bar-plots.qzv'

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 10))

kwargs = dict(count=8, level=7, legend_short=True, sort_by_mean2=False, sort_by_mean3=False)

dokdo.taxa_abundance_bar_plot(qzv_file,
                              ax=ax1,
                              group='body-site',
                              group_order=['gut'],
                              **kwargs,
                              artist_kwargs=dict(show_legend=True))
dokdo.taxa_abundance_bar_plot(qzv_file,
                              ax=ax2,
                              include_samples={'body-site': ['tongue', 'gut']},
                              **kwargs,
                              artist_kwargs=dict(show_legend=True))

plt.tight_layout()

Please reinstall the latest version with the fix. Please let me know if this doesn't fix your problem.

khemlalnirmalkar commented 3 years ago

Hi @sbslee Thank you so much for fixing it, one question, when we choose sort_by_mean2=False, sort_by_mean3=False, how the sorting of taxa works? still with mean? but mean of particular taxa's abundance in all samples? or in one sample or in first sample of the list/group. reason of asking, because Prevotella's abudance in my two donors are ~27%, & ~17%. Now i cant see in count 20. thats strange. its coming in count 38.

sbslee commented 3 years ago

@khemlalnirmalkar,

When you set sort_by_mean2 and sort_by_mean3 as False, the taxa will be sorted by their mean abundance across all samples. This ensures that all of your subplots have the identical color schemes for all taxa regardless of sample grouping/filtering.

If I understand your question correctly, you want Prevotella to be included in the plots, but it gets merged with 'Others' because its rank in mean abundance is 38th, right? If that's the case, see if the taxa_names option can help in your situation. You can manually specify the names of taxa you want.

P.S. If my previous answer solved the original issue, could you let me know so I can close this issue? If you have other questions about the taxa_names option, please open a different issue so other users can benefit from your insightful questions :)

khemlalnirmalkar commented 3 years ago

Yes, please. You can close the issue, Thank you so much for everything, I will open a new issue, if i get any error,