Closed JB-8 closed 10 years ago
Hi Julia There are several bootstrap packages in R but I think that you could do this very easily just by reassigning groups at random to the samples and recomputing a within group/between group distance statistic using the unifrac distances you have already computed. If you have 20 samples you get a 20 by 20 unifrac distance matrix between them, with the first 10 in one group( call gr1 the grouping variable equal to 1 or 2) and the next 10 in the other say
mat2=as.matrix(dist2) within12=sum(mat2[gr1==1,gr1==1])+sum(mat2[gr1==2,gr1==2]) computes a within group statistic then permute the groups for as many simulations (=S) as you need perm=rep(0,S) for (s in (1:S)){ gr1=gr1[sample(20,20,T)]
perm[s]=sum(mat2[gr1==1,gr1==1])+sum(mat2[gr1==2,gr1==2]) } length(which(within12<perm))/S
Hope this makes sense, Susan
Hi Susan,
thanks for you reply. I'm not very experienced using these kinds of statistics in R and don't fully understand your suggestion - could you help me to clarify what you mean?
My UniFrac matrix looks like this (part of the matrix):
D/ND and DIL/NDIL are the 4 cross-treatments and the numbers are the blocks (each treatment was replicated 6x).
This is what the cluster dendrogram from unweighted UniFrac values, clustered with "Ward's minimum variance" looks like:
To implement your suggestion, I have added the grouping variable as a column as well as a row named "gr1":
and then tried to implement your code, but I'm not sure if I'm doing this right:
This is the output:
I have to say that don't really understand this result and also don't know how I would get from this to bootstrap values for the hierarchical clustering tree…
I hope my reply is not too confusing and would be very thankful for further help! Julia
Julia,
You say "4 cross-treatments", is this two treatments, and their respective controls? Do you have paired samples? Is there a relevant time sequence (e.g. a "crossover" design)? It sounds like you have more structure to this experimental design, and might be able to use that to ask more precise questions.
joey
Hi, After some more trying I think I got the way Susan suggested to test it working.
I calculated the within-statistic for each treatment using: mat2=as.matrix(masterUF) rownames(mat2)<-c(rep(c(1,2),each=12)) colnames(mat2)<-c(rep(c(1,2),each=12)) within12=sum(mat2[rownames(mat2)==1,colnames(mat2)==1])+sum(mat2[rownames(mat2)==2,colnames(mat2)==2])
Then I performed 1000 random permutations:
S=1000
perm=rep(0,S)
for (s in 1:1000){
gr1=c(rep(c(1,2),each=12))
tempgr<-gr1[sample(24,24,T)]
rownames(mat2)<-tempgr
colnames(mat2)<-tempgr
perm[s]=sum(mat2[rownames(mat2)==1,colnames(mat2)==1])+sum(mat2[rownames(mat2)==2,colnames(mat2)==2])
}
and finally checked how often the within group statistic was larger than the random permutation: 1-length(which(within12_Daph<perm))/S
I think this should do it for now. If, however, you have any further ideas and suggestions of how to calculate UniFrac statistics in phyloseq/R let me know!
Thanks for your help, Julia
The reason I asked about your experimental design is because adonis
in the vegan package provides some useful testing framework for this sort of thing, but you have to provide it the experimental design using the formula interface (e.g. ~Subject + Period + Treatment
). I believe we've some examples of this floating around. Try searching for "adonis" and "phyloseq"
Glad it sounds like you've solved your issue, though. I'll close this for now.
Hi,
I would like to use UniFrac to find out if the microbial communities in my experiment are affected by treatment. I have managed to calculate a UniFrac matrix (weighted and unweighted) and performed and PCoA as well as hierarchical clustering on the data. This shows me that the data does in fact split quite well by treatment.
However, I would like to strengthen my analysis using statistics on the UniFrac matrices, such as bootstrapping and the P-test (described here: http://bmf2.colorado.edu/unifrac/help.psp#phylo_test).
Is it possible to do this in phyloseq or are you aware of any other package which could do so?
I would be very thankful for any help and can provide you with more details about my question if needed.
Best, Julia