Closed khemlalnirmalkar closed 2 years ago
Hi @khemlalnirmalkar,
Thanks for the suggestion! If it's a simple analysis that can be readily implemented, I'm glad to update Dokdo to support it.
That being said, to be honest, I'm still not 100% sure if I understand the type of analysis you're talking about ("correlation (spearman/Pearson) analysis with FDR-correction"). Could you please give me some concrete examples (e.g. figures from a published research article, cartoon drawings of yours)?
P.S. If you are a programmer yourself, you are more than welcome to open a PR :)
Hi @sbslee, Thanks, This is what I use in R for correlation with FDR/p-correction https://microbiome.github.io/tutorials/Heatmap.html Scatter plot for correlations (see fig.2) https://journals.asm.org/doi/pdf/10.1128/Spectrum.00034-21 (Fig. 7 & 8) https://rupress.org/jem/article-pdf/218/1/e20200606/1405371/jem_20200606.pdf and attached a cartoon FDR-correction for p-values https://en.wikipedia.org/wiki/False_discovery_rate (a common method to correct false positive p-values)
Sorry, i am a microbiologist, not a programmer. I just to implement from available tutorials. Thought this could be a good thing to have in Dokdo, Please let me know if you need something Thanks
@khemlalnirmalkar,
Sorry, i am a microbiologist, not a programmer.
No worries at all. Your suggestions are already greatly appreciated.
As I understand it, it appears you want to perform cross-correlation analysis between microbiome vs. some matrices for shared samples, similar to the tutorial given by Leo Lahti and Sudarshan Shetty et al. So I looked at the tutorial and devised two methods: cross_association_table
and cross_association_heatmap
(see their documentations here and here).
Below are examples:
>>> import pandas as pd
>>> import dokdo
>>> otu = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/otu.csv', index_col=0)
>>> lipids = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/lipids.csv', index_col=0)
>>> df = dokdo.cross_association_table(
... otu, lipids, normalize='log10', nsig=1
... )
>>> df.head(10)
taxon target corr pval adjp
0 Ruminococcus gnavus et rel. TG(54:5).2 0.716496 4.516954e-08 0.002284
1 Uncultured Bacteroidetes TG(56:2).1 -0.698738 1.330755e-07 0.002345
2 Moraxellaceae PC(40:3e) -0.694186 1.733720e-07 0.002345
3 Ruminococcus gnavus et rel. TG(50:4) 0.691191 2.058030e-07 0.002345
4 Lactobacillus plantarum et rel. PC(40:3) -0.687798 2.493210e-07 0.002345
5 Ruminococcus gnavus et rel. TG(54:6).1 0.683580 3.153275e-07 0.002345
6 Ruminococcus gnavus et rel. TG(54:4).2 0.682030 3.434292e-07 0.002345
7 Ruminococcus gnavus et rel. TG(52:5) 0.680622 3.709485e-07 0.002345
8 Helicobacter PC(40:3) -0.673201 5.530595e-07 0.003108
9 Moraxellaceae PC(38:4).1 -0.670050 6.530463e-07 0.003302
>>> import pandas as pd
>>> import dokdo
>>> otu = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/otu.csv', index_col=0)
>>> lipids = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/lipids.csv', index_col=0)
>>> dokdo.cross_association_heatmap(
... otu, lipids, normalize='log10', nsig=1,
... figsize=(15, 15), cmap='vlag', marksig=True,
... annot_kws={'fontsize': 6, 'ha': 'center', 'va': 'center'}
... )
Compare above results with that from the tutorial:
If you want to give this a try, please install the development version of Dokdo:
$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ git checkout 1.13.0-dev
$ pip install -e .
Please let me know if you have any questions/suggestions/concerns. Thanks for the feature request!
@sbslee, This is great, thank you so much for making this function available, My apologies for the late response, Is it possible to pick one bacteria and make a scatter plot with p-values on it? (i had added an image earlier) e.g. for the first bacteria from above results
taxon target corr pval adjp
0 Ruminococcus gnavus et rel. TG(54:5).2 0.716496 4.516954e-08 0.002284
Thanks
I added a new method cross_association_regplot
that I think gives you what you want.
import pandas as pd
import dokdo
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
otu = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/otu.csv', index_col=0)
lipids = pd.read_csv('/Users/sbslee/Desktop/dokdo/data/miscellaneous/lipids.csv', index_col=0)
df = dokdo.cross_association_table(otu, lipids, normalize='log10', nsig=1)
df.head(10)
To output:
taxon target corr pval adjp
0 Ruminococcus gnavus et rel. TG(54:5).2 0.716496 4.516954e-08 0.002284
1 Uncultured Bacteroidetes TG(56:2).1 -0.698738 1.330755e-07 0.002345
2 Moraxellaceae PC(40:3e) -0.694186 1.733720e-07 0.002345
3 Ruminococcus gnavus et rel. TG(50:4) 0.691191 2.058030e-07 0.002345
4 Lactobacillus plantarum et rel. PC(40:3) -0.687798 2.493210e-07 0.002345
5 Ruminococcus gnavus et rel. TG(54:6).1 0.683580 3.153275e-07 0.002345
6 Ruminococcus gnavus et rel. TG(54:4).2 0.682030 3.434292e-07 0.002345
7 Ruminococcus gnavus et rel. TG(52:5) 0.680622 3.709485e-07 0.002345
8 Helicobacter PC(40:3) -0.673201 5.530595e-07 0.003108
9 Moraxellaceae PC(38:4).1 -0.670050 6.530463e-07 0.003302
Next, draw a scatter plot that shows association between Ruminococcus gnavus et rel.
and TG(54:5).2
:
ax = dokdo.cross_association_regplot(otu, lipids, 'Ruminococcus gnavus et rel.', 'TG(54:5).2')
ax.text(200, 1.7, 'P = 0.002284\nr = 0.72')
Let me know what you think!
Hi @sbslee, This is great, thank you so much for adding the method, Do i need to install the new version or v1.13.0 will work?
The latest official version is 1.12.0
, and the new methods are implemented in the 1.13.0-dev
branch. Therefore, you would need to install the development version:
$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ git checkout 1.13.0-dev
$ pip install .
The latest official version is
1.12.0
, and the new methods are implemented in the1.13.0-dev
branch. Therefore, you would need to install the development version:$ git clone https://github.com/sbslee/dokdo $ cd dokdo $ git checkout 1.13.0-dev $ pip install .
Okay, thanks @sbslee
Hi @khemlalnirmalkar,
Have you had a chance to test the new methods (cross_association_table
, cross_association_heatmap
and cross_association_regplot
)? I'd like to publish the official release for 1.13.0 soon and I just wanted to make sure the new methods behave as they should for end users. Please let me know :)
@sbslee , Yes, i tried, and it looks great to me. Sorry, i didn't mention it in my previous comment. Thanks again for adding these new functions.
Lovely, thanks for letting me know and for this PR!
Hi @sbslee, Does correlation (spearman/Pearson) analysis with FDR-correction fall under Dokdo's scope of work? If yes, is it possible to have a heatmap with correlation values for many factors/features or some scatter plots for individual features?
Thanks Khem