sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
42 stars 12 forks source link

alpha and beta diversity without qiime file #35

Closed khemlalnirmalkar closed 2 years ago

khemlalnirmalkar commented 2 years ago

Hi @sbslee, Can you add some codes to make alpha (Shannon, observed and evenness) and beta (Bray-Curtis and Jaccard index) with some plots from a normal text file?

If someone is not using qiime or shotgun data which comes as a normal text file, this code can be useful, Of course, phylogenetic info cant be added here, but still, this is going to be useful, There are vegan and phyloseq packages available but not straightforward codes with proper explanation, Thanks, Khem

sbslee commented 2 years ago

@khemlalnirmalkar,

Thanks for the suggestion! I see this is similar to your previous request in #21, correct? As you can see in 1fd24aa, I just updated the dokdo.alpha_diversity_plot method to accept pandas.DataFrame as well. You'd still need to import your text file (CSV, TSV, etc.) into a dataframe object, but I'm assuming you're already familiar with that because that's the identical solution I implemented for #21.

Below example shows how it works:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
sns.set()

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/faith_pd_vector.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
text_file = '/Users/sbslee/Desktop/test-alpha-diversity.csv'

Plot the regular way:

dokdo.alpha_diversity_plot(qza_file, metadata_file, 'body-site')
plt.savefig('with-qza-file.png')

with-qza-file

Now with a text file:

df = pd.read_csv(text_file, index_col=0)
dokdo.alpha_diversity_plot(df, metadata_file, 'body-site')
plt.savefig('with-text-file.png')

with-text-file

If you are satisfied with above, I will also update the dokdo.beta_2d_plot and dokdo.beta_3d_plot methods. I just wanted to hear your feedback before I commit.

Lastly, please don't forget these changes are implemented in the 1.12.0-dev branch until the official version is released:

$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ git checkout 1.12.0-dev
$ pip install -e .
khemlalnirmalkar commented 2 years ago

Hi @sbslee Thanks for your quick action, this looks great and i guess its ready for beta-div, I am curious to know how was the data structure of the text file for this figure, please can you share it? Thanks again, Cheers Khem

sbslee commented 2 years ago

Oops, forgot to attach the test input file (test-alpha-diversity.csv):

test-alpha-diversity.csv

The regular QZA file is included in the dokdo repository (dokdo/data/moving-pictures-tutorial/faith_pd_vector.qza).

I will also leave a link to development documentation in case you want to check it out (there you will see that the method now accepts pandas.DataFrame as input):

https://dokdo.readthedocs.io/en/1.12.0-dev/dokdo_api.html#module-dokdo.api.alpha_diversity_plot

Finally, thanks for your feedback. I will update the other methods ASAP and get back to you.

sbslee commented 2 years ago

@khemlalnirmalkar,

The update is complete! Please let me know if you have additional methods you want that need to be updated.

Input test files: test-beta-diversity-2d.csv test-beta-diversity-3d.csv

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/unweighted_unifrac_pcoa_results.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
dokdo.beta_2d_plot(qza_file, metadata=metadata_file, hue='body-site', figsize=(5, 5))
plt.savefig('beta-2d-plot-qza.png')
# Explained proportions computed by QIIME 2:
# 33.94% for Axis 1
# 25.90% for Axis 2

beta-2d-plot-qza

df = pd.read_csv('test-beta-diversity-2d.csv', index_col=0)
dokdo.beta_2d_plot(df, metadata=metadata_file, hue='body-site', figsize=(5, 5))
plt.savefig('beta-2d-plot-csv.png')

beta-2d-plot-csv

dokdo.beta_3d_plot(qza_file, metadata=metadata_file, hue='body-site', figsize=(7, 7))
plt.savefig('beta-3d-plot-qza.png')
# Explained proportions computed by QIIME 2:
# 33.94% for Axis 1
# 25.90% for Axis 2
# 6.63% for Axis 3

beta-3d-plot-qza

df = pd.read_csv('test-beta-diversity-3d.csv', index_col=0)
dokdo.beta_3d_plot(df, metadata=metadata_file, hue='body-site', figsize=(7, 7))
plt.savefig('beta-3d-plot-csv.png')

beta-3d-plot-csv

khemlalnirmalkar commented 2 years ago

Hi @sbslee, Thank you so much for the update, this is great, I was thinking to have codes to calculate the alpha (Shannon, observed and evenness) and beta-div (Bray-Curtis and Jaccard index) and then make plots, is it possible here? in R, generally works with vegan and also phyloseq (doesnt have evenness) Thanks,

sbslee commented 2 years ago

Unfortunately, what you are describing here (i.e. performing diversity analyses for non-QIIME 2 data) is beyond the scope of Dokdo. Sure, Dokdo can be -- and has been -- extended to visualize data from non-QIIME 2 software, but it's a whole different story to support analyzing such data. Hope this makes sense!

khemlalnirmalkar commented 2 years ago

Yes, i can understand, I was thinking if vegan or phyloseq can be added/import in Dokdo and run diversity analyses for taxonomy files from non-qiime2 data such as meta-genomic/transcriptomic data. If it cant be, no worries, this new update is more than i was wishing for, Thanks again, Cheers