joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
577 stars 187 forks source link

Issue with import_qiime #481

Closed locon833 closed 9 years ago

locon833 commented 9 years ago

I had to remove some samples from my mapping file in order to do adonis for only a particular group of samples. However, it does not seem to be accepting this edited mapping file. I know it can't be because metadata and mapping files are not exactly matching as I edited it before to map file with 9 samples from 35 samples and it worked.

I am getting the error for >map = import_qiime_sample_data(mapfile) Error in row.names<-.data.frame(*tmp*, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘’

Or is there some way to do adonis with a script that ignores a particular group from comparison? ANY help is appreciated

audy commented 9 years ago

The error suggests that there are duplicate row names. It's difficult to diagnose exactly without a sample input that reproduces the error.

Or is there some way to do adonis with a script that ignores a particular group from comparison? ANY help is appreciated.

It's probably easier to load the original data, then subset the samples you want to compare in R and run adonis. Look into the subset_samples function.

CarlyMuletzWolz commented 9 years ago

As audy said error indicates duplicate row.names. Make sure there is not a duplicate name in your mapping file.

Here is some code that should do what you want if row.names aren't duplicated. This is from my work, so you would have to fill in what is correct for your data. GM is a phyloseq object that I made by doing this https://github.com/joey711/phyloseq/issues/480.

GM_data = prune_samples(sample_sums(GM)>=500, GM)

library(vegan)

GM_data_compare <- subset_samples(GM_data, Species_compare == "Y")

df_compare <- as(sample_data(GM_data_compare), "data.frame") jaccard_com <- distance(GM_data_compare, "jaccard")

group_adonis <- adonis(jaccard_com ~ Species*Run_side, data = df_compare) group_adonis

If you get a significant p value you should then do a PERMDISP to see if it is dispersion or the centroid is driving the significance

is the difference due to centroid or dispersion?

USe betadisper to find out

groups <- df_compare[["Species"]] class(groups) groups mod <- betadisper(jaccard_com, groups)

anova(mod)

If disperson is different between groups plot to figure out who is different from who, if dispersion is not different you know the significant p-value from anosim indicates differences in centroids. If you get a significant anosim and betadisper result then it can be both. Currently, no method to tell if it is dispersion alone other than visual inspection.

plot(mod) boxplot(mod) mod.HSD <- TukeyHSD(mod ) plot(mod.HSD)

nidhi13 commented 9 years ago

Is there a way to do betadisper() and Tukey test for multiple dependent factors? I have Day, Treatment and Conditions overlapping and don't understand if taking just one factor to test group dispersion will make sense? I also did an adonis for the multiple factors using:

bdist2=phyloseq::distance(merge2,"bray") adonis.mixed=adonis(bdist2~Day+Condition+Antibiotics+Day * Condition * Antibiotics, as(sample_data(sponge.scale786),"data.frame"))

CarlyMuletzWolz commented 9 years ago

I do not believe that you can do a PERMDISP (betadisper) with multiple dependent factors. You would have to do them individually to see what is driving what. I'm no expert on this though and I think it is advisable to read up on these statistical methods. However, I would say that PERMANOVA (adonis) is similar to a MANOVA where if you find a significant p-value you then have to run individual tests to find out which of your variables are driving the significance.

A good paper to read is "PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing?" by Marti Anderson and Daniel Walsh. Marti Anderson and colleagues have developed some of these methods and has several good papers to read on the topic.

"The null hypothesis tested by PERMANOVA is that, under the assumption of exchangeability of the sample units among the groups, H0: ‘‘the centroids of the groups, as defined in the space of the chosen resem- blance measure, are equivalent for all groups.’’ "

"PERMDISP explicitly tests only H0: ‘‘the average within-group dispersion (measured by the average distance to group centroid and as defined in the space of the chosen resemblance measure), is equivalent among the groups.’’"

You could make a new column in your dataset and combine your day_treatment_conditions into one term (which would then form the separate groups) and do betadisper on that one column. Again just a suggestion, but I give a full disclaimer that I do not know if this is statistically appropriate. Also, you should realize that having differences in sample sizes among groups can produce a significant p-value that may be driven by the differences in sample size only. See the paper I mentioned above.

It might be worthwhile to post such a question on ResearchGate if you are affiliated with a university. It is a good forum for such questions.

joey711 commented 9 years ago

Thanks for the long answer @CarlyRae

meanwhile, it would seem that this issue has been resolved. Was it duplicate row names? We may never know (but probably was).

Cheers, and thanks for the feedback and discussions!

joey