jsilve24 / philr

Phylogenetic Isometric Log Ratio
http://bioconductor.org/packages/philr/
19 stars 6 forks source link

Is the philr tutorial equally applicable to feature table containing ASVs? #8

Closed yanxianl closed 5 years ago

yanxianl commented 5 years ago

Hi,

Thanks for developing the PHILR, a great tool for analyzing amplicon data in a CoDA manner.

I was reading the PHILR tutorial but the example data set was based on OTU clustering. Is the tutorial equally applicable to feature table containing ASVs?

Cheers

jsilve24 commented 5 years ago

Works the exact same way. I don’t think it needs another tutorial.

Does that make sense? Justin

Sent from my Mobile Device

On Jun 17, 2019, at 10:17, yanxianl notifications@github.com wrote:

Hi,

Thanks for developing the PHILR, a great tool for analyzing amplicon data in a CoDA manner.

I was reading the PHILR tutorial but the example data set was based on OTU clustering. Is the tutorial equally applicable to feature table containing ASVs?

Cheers

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

yanxianl commented 5 years ago

Hi Justin,

I have 3 questions:

1. Preprocessing of feature table. In the PHILR paper, different OTU table filtering methods were applied to different datasets. For example, in the tutorial dataset, taxa were filtered if they were not seen with more than 3 counts in at least 20% of samples or had a coefficient of variation ≤ 3. However, it is argued that these prefiltering steps are not necessary for ASVs since they are sequencing-error free. Do you think it necessary to apply these "hard filtering thresholds" to ASV table as well? If so, what's your recommendations for prefiltering feature table? Is the "soft-thresholding (taxon weighting)" a better alternative?

2. Phylogenetic tree. Sequence placement into a reference tree is now recommended for building the phylogeny for amplicon data analysis. Is a tree built by sequence placement more suitable for the PHILR than a de novo tree?

3. How to identify balances that distinguish categorical variables with more than 2 levels? The sparse logistic regression was used to identify balances that distinguished human/nonhuman samples. What if I have a categorical variable with 3 different outcomes? What statistical method do you recommend to perform this task?

Thanks in advance.

jsilve24 commented 5 years ago

All of these questions are difficult to answer, but I will do my best to be concise.

Re Preprocessing - There are two reasons to do preprocessing/filtering (1) because you think some things/taxa are spurious and you want to remove them (2) because some taxa are so low abundance that you really don't have enough information to analyze them or to say anything interesting about them (i.e., focusing your statistical power intelligently). I would say that ASVs are not perfect, nothing is perfect. I still do preprocess but I like to think about it for the second reason, try to focus your attention where you have data. I realize I am falling short of telling you how to do your analysis but there are really no hard and fast rules here. That said, if you have a tremendous amount of zeros, even with taxa weighting, this can strongly influence your modeling results.

Re Phylogenetic Tree - You pick the tree that you think is meaningful. PhILR doesn't care beyond that. That's your choice.

Re Balances when categorical variables with more than 2 levels: You can use multinomial regression. (e.g., multiclass logistic regression). There is an implementation of this in the glmnet package as well.

yanxianl commented 5 years ago

Thanks for sharing your thoughts on these questions.