hputnam / Moorea_Sym

Symbiodinium communities of specifist and generalist corals
0 stars 0 forks source link

Symbiodinium community metrics #11

Closed jrcunning closed 7 years ago

jrcunning commented 7 years ago

diversity, evenness, beta diversity (distance to centroid) amount of specific taxa (e.g., D1a, A types...)

daniclaar commented 7 years ago

So to start with the very basic question of - do coral colonies cluster based on their Symbiodinium communities? - using a basic NMDS in Phyloseq's ordinate command: ord <- ordinate(phy.f, "NMDS", "bray",try=100) When I do this, I get a (seemingly incorrect) error -
Warning messages: 1: In metaMDS(veganifyOTU(physeq), distance, ...) : Stress is (nearly) zero - you may have insufficient data
So then, I decided to look just at the distance metric in vegan using:
vd <- vegdist(t(otu_table(phy.f.p)),method="bray")
And it returns a distance metric with all ones, that is, every sample is perfectly dissimilar.
This is obviously incorrect because 1) there should be some differences between species, at the very least, and 2) there should be some similarity within species at the very least.
I don't see anything right off the bat that's wrong with the phy.f object, but I have looked through everything, and I am not sure what else could be causing such a strange result.
Do either of you have any ideas? I'll keep working on it, but if you have any thoughts, I'd be glad to try them out!

daniclaar commented 7 years ago

So it looks like it is a problem with the creation of the phyloseq object d328ac4551b1882a10c3a429125f6693904dabc7
Hollie was able to ordinate the data the day before our meeting, so it seems likely that the changes to include 100 OTUs somehow broke the phyloseq output.
The most likely suspect is in changes for lines 7-8
-# Build phyloseq object -RAnalysis/Data/Moorea.RData: Bioinf/clust/all_rep_set_rep_set_nw_tophits.tsv Bioinf/clust/97_otus_bysample.tsv RAnalysis/Data/Moorea_sample_info.tsv Bioinf/clust100/100_otus_rep_set_nw_tophits.tsv +# Build phyloseq object for 97% OTUs +RAnalysis/Data/Moorea.RData: Bioinf/clust/all_rep_set_rep_set_nw_tophits.tsv RAnalysis/Data/Moorea_sample_info.tsv
It looks like the most likely problem is that Bioinf/clust/97_otus_bysample.tsv is missing. I can try to run this again with the Makefile updated, but I'm working on my new computer so I don't quite have everything set up yet, and so I can't run the Makefile right away. @jrcunning can you take a look and see what you think? Thanks!

daniclaar commented 7 years ago

UPDATE to above: I've gone back and downloaded the old versions of the .RData file (because I realized that the Makefile changed last week, but it doesn't look like the .RData file actually did). When I try to ordinate phy.f from the previous versions of Data/Moorea_sym.RData, I run into the same problem.
I'm not sure how @hputnam could have ordinated before (maybe a local copy on your machine?), but it looks like this problem may go back further than we previously thought.

jrcunning commented 7 years ago

I'm not sure what the problem was but I think I've fixed it... I updated the filter_notsym.R script in the SymITS2 repository and reran the Makefile for Moorea_sym. The RData files seem to be intact now and I ran a quick ordination and it worked (didn't save code though), so yours should work now... I will work on producing a document comparing the 100% to 97%bysample OTUs to make sure we are all set with the sequence analysis so we can then focus on stats.... let me know if the problem is resolved on your end. git pull here and SymITS2 just to have the latest of everything...

daniclaar commented 7 years ago

Hey! That's great. So I've realized a problem - .gitignore just ignored your new .RData file, so I fixed that. Could you please commit your repo again, so I can see the new output file you created? Sorry and thanks!!

jrcunning commented 7 years ago

it says "everything is up to date" even though the .RData files on GitHub are not what are in my local repo... any ideas?

jrcunning commented 7 years ago

So maybe it's because the files are actually unchanged... I just added a couple lines of code to the end of the data_exploration.Rmd that do an nmds ordination, and it executes successfully on my machine...does it on yours?

daniclaar commented 7 years ago

Hm...no. It doesn't look like the .RData file is getting pushed for some reason (I can see the changes in data_exploration.Rmd, though) Have you tried explicitly git adding your Data/Moorea_sym.RData file and then commit/push? If that doesn't work, you can try git rm -r --cached . and then add/commit/push again and see if it works. If those don't work we may have to think about it some more...

jrcunning commented 7 years ago

ok did git rm -r --cached . and then add/commit/push....

daniclaar commented 7 years ago

Still no changes to Data/Moorea_sym.RData on my end, or on GitHub as far as I can see. That's very weird, especially if the .RData files on GitHub are different than your local repo.

jrcunning commented 7 years ago

I'm not sure they are different though... are you sure the problem is not on your end somewhere in the Rmd document with caching? Can you clear your environment, make a new script, and run:

load("Data/Moorea_sym.RData") ord <- ordinate(phy.f, "NMDS", distance="bray") plot_ordination(phy.f, ord)

hputnam commented 7 years ago

It looks like 97_otus_bysample.tsv file could be causing the distance matrix problem. It does not seem to be reintegrating after clustering by sample, which leads to what looks like separate OTUs per sample creating a dissimilarity of 1 and a crazy ordination. @jrcunning and I talked on the phone today @daniclaar about this discussion we had yesterday.

jrcunning commented 7 years ago

OK, so digging a bit deeper into the OTU table... The complete dissimilarity among all samples, which I thought may have resulted from a bug in the SymITS2 code, may actually be real. This is happening because after clustering at 97% within each sample and picking representative sequences, none of the representative sequences from different samples are the same. Therefore, no OTU merging occurs across samples, and all samples have completely different sets of OTUs. Indeed, when you look at the 100% OTUs, you find that out of 6197 OTUs (i.e. unique sequences), 6109 occur in only one sample, 88 occur in two samples, and no sequences occur in more than two samples!. If this is correct, this means that all of the samples are comprised almost entirely of sequences that do not occur in any other samples, which is very unexpected. Can you guys dig into this independently and confirm whether you find the same thing? If this is real, could it be due to PCR and/or 454 sequencing errors? Or alternatively, some artifact introduced in the construction of the Moorea_seqs.fasta file?

jrcunning commented 7 years ago

...or maybe it's because the BARCODES are still on the sequences.... DOH!!!!

daniclaar commented 7 years ago

:-O That would definitely do it!