weecology / LDA-kratplots

0 stars 2 forks source link

LDA on all plots together? #5

Closed emchristensen closed 6 years ago

emchristensen commented 6 years ago

What would happen if we ran an LDA on controls and krat removal plots together -- could it tell the difference and separate them into different topics? (may have done this already, years ago at the beginning of the Extreme-events-LDA project)

diazrenata commented 6 years ago

I just pushed .pdfs of the controls + exclosures, controls, and exclosures LDA compositions.

Looking at them visually, I do see similarities between topics among them. They don't shake out 100% control/exclosure, but there are more similar-looking ones between the controls and the combined than between the exclosures and the combined.

Here is how I see it:

emchristensen commented 6 years ago

Hmm interesting. Have you plotted out the time series?

diazrenata commented 6 years ago

Just pushed them with the timeseries plots added. At first glance, the best I can say is that the combined time series shows the strong signals from both (i.e. DS dominance early; PB in the late 90s-2000s) - probably because when one community (exclosures or controls) is strongly dominated by a topic, and the other is mixed, the strong signal comes through? Did you have something in mind to look at in the time series?

emchristensen commented 6 years ago

Oh I see what it is now. Sorry, just going through the steps of your code now. I was picturing doing a combined control/exclosure LDA as keeping the two plot types as separate samples but throwing all the samples together into the model: essentially doing a rbind(rodent_data_contol, rodent_data_exclosures) and run the LDA on that. See if the topics shake out as topics that only show up in control samples or only exclosure samples or if there are topics that show up in both. It will take a little more work to disentangle the output, but it should work.

diazrenata commented 6 years ago

Is this more what you mean? I put controls and exclosures in the same data frame as different samples, gave that to the LDA, and then made plots of just-the-controls/just-the-exclosures. https://github.com/weecology/LDA-kratplots/blob/master/reports/lda-controls-and-ex-simultaneously.pdf

for this one, the color scheme is fixed (so dark blue is always Topic 1). You can see some topics are rare in the controls (2 and 3) but dominant in the exclosures for some time.

emchristensen commented 6 years ago

ooo that's interesting! The topics still look pretty similar to what they were before, which makes sense. And it's a good thing the controls are dominated by the krat topics and those don't show up much on the krat exclosures... I like this a lot actually. There are some interesting similarities, like topic 4 shows up in both treatments in the late 80s/early 90s (although it looks like it hangs on a lot longer on exclosures), and of course the PB topic in the 2000s. I think it's worth continuing along with this version, see what the changepoint models come up with. I like that this allows us to talk about the topic species compositions directly, and highlight that there really are a lot of similarities between the two treatments, but the timing of changes seems to be different (maybe). I think that's one of our main messages with this project: similar kind of changes happen on both treatments, but timing is different because of competition from krats. What do you think?

diazrenata commented 6 years ago

Just pushed results from running the changepoint model on the control/exclosure samples, with the LDA from all the samples at once.

Interestingly, this gives us similar results as models with the weights proportional to sample size: two changepoints for both plot types, but at different times.

It finds a changepoint in the mid-1980s for the controls, which looks to me like Topic 2 (mostly DM/DO) taking over from Topic 5 (DS/DM). The second one for controls is in the late 1990s, and it looks to me like Topic 1/2 (PP, DM/DO) become dominant and Topic 5 (DS/DM) pops up every now and again. This in contrast to Topics 2 (DM/DO) and 3 (PB) really being on top for a while.

For the exclosures, the changepoints are in the late 1990s and in 2010. Like before, they seem to bracket a period when topic 3 (PB) was dominant. Before, it's mixed; after, it's still mixed but with topic 1 (PP) mostly dominant.

I think it's funny that we keep getting this 2 changepoints result. Other than that confusion, I agree that this looks to me like the controls and exclosures are not synchronized WRT when they are dominated by a single topic/a mixture, and they can undergo shifts independent of each other, but environmental conditions can force changepoints across plot types (late 1990s).

emchristensen commented 6 years ago

Hmmmmmmm. While it feels a little disconcerting that it keeps coming up with 2 changepoints instead of 4 like we found before, these changepoints do seem to make sense. I think you're right, the changepoint model seems to be grabbing on to the points where the main (dominant) topic changes, or when dominance by a single topic changes to mixed-dominance. This seems different from what the old version of the changepoint model was coming up with (in the paper, it picked up that weird changepoint in 1995 that involved just the rare topics), though I'm ok with this new result too because the differences it's focusing on are ecologically meaningful and interesting. I'll ask if Juniper has any insight on that...

I'm hesitant to point to environmental conditions for the late 1990s changepoint. It definitely has to do with the invasion of Baileys, which we're pretty sure was facilitated by drought in the 90s and the flood in 1999, but that part is hard to prove. I think it's a compelling piece of the story that we see that invasion changepoint happen earlier on exclosures (1997) than controls (1999).

Here's how I see the story unfolding: if our main question is "does the presence/absence of keystone species (krats) make dynamics different on exclosures compared to controls" I think our results point to yes. while the controls are of course dominated by the 2 krat topics, there are also contributions from the other 4 topics, which suggests that there are enough similarities in species composition on the two types of plots that it would be entirely possible for dynamics to be synchronized. I.e. both types of plots have the same players involved (which is obvious to us because we know the system, but I think it's important to be able to show we aren't dealing with two totally disjoint types of rodent communities). The one changepoint that is shared by the two types of plots in the late 1990s has to do with baileys, which easily dominates on krat exclosures and takes a good chunk out of the krat dominance on controls (here we can speculate about Baileys possibly stepping into the role of "keystone species"). Each of the plot types has a second changepoint that is not shared between them, 1985 on controls (which has entirely to do with DS decline) and 2010 on exclosures (which seems to be about Baileys decline). My feeling is that species interactions, and specifically the invasion/decline of dominant (keystone) species, seem to be driving rodent community dynamics. The environmental disturbances are likely explanations of those declines/invasions, but the fact that that one changepoint occurred 2 years earlier on exclosures suggests it wasn't simply due to a site-level disturbance.

emchristensen commented 6 years ago

I brought this up with Morgan this week (the question of whether we should go with the combined-LDA or separate control and exclosure LDA for the paper), and she made the point that by combining them we're losing some details. For example, the LDA on just controls (I'm looking at the figure in comparing-weights.pdf), there are multiple topics with DM as a dominant (e.g. topic 5 is DM/PE/RM, topic 2 is DM/DO), and so the changepoint model picks up on changes over time between these topics. However in the combined LDA, because half the samples are from the exclosure plots where krats aren't reliably associated with other species, instead we have a single DM/DO topic and the other species are off in their own topics. Then because the control plots have krats, that one DM/DO topic swamps out the other stuff going on. Essentially we're losing the fact that krats are associated with different species at different times on the controls.

So, for this project, I think those details are important. If we were asking if the two types of plot are undergoing the same change in species composition/dominance, I think the combined LDA would be the way to go. But the main question here is whether the timing of changes on the two types of plots is simultaneous, and for that I think we get more information by doing separate LDAs.

juniperlsimonis commented 6 years ago

hey y'all! great work on this! just been catching up on things and i agree with @emchristensen's most recent delineation about the question of interest and whether to combine or keep separate the LDAs. the associations are affected the treatment (as intended), and so a combined LDA should really be a combined model that includes the treatment as a covariate, so it could be determined explicitly. this line of thinking can be somewhat evaluated through what you all are doing right now with the two kinds of models (separate and combined), but it's not the most robust approach. (although it's the best we have right now)

juniperlsimonis commented 6 years ago

as a sidebar, i'd also be curious to know what the changepoint model output looks like on the exclosure-only data and on the combined (without splitting) data.

i think it's interesting that the changepoint for the controls from the split LDA does result in 2 change points, but i don't know (although i'm still thinking this through) if it's necessarily appropriate to compare that to the LDA on just the controls in terms of thinking about this producing a similar result to when the weights were still in... i think the 2 might be a red herring here. the topic structure are different and you're splitting the data, so you're introducing a few other key changes.

diazrenata commented 6 years ago

Thanks Juniper!

Based on your assessment and Morgan/Erica's concerns about obscuring important details, it sounds to me like, at a minimum, we would need to implement this differently to get appropriate results.

Also, I guess I'm not sure if this is quite the question we want to ask? The combined LDA helps build intuition for how the LDA performs on samples we know are mixed between fairly defined community types, but is that our focus? I'd been approaching it basically assuming that the two communities are parallel worlds and focusing on commonalities/differences wrt to the changepoint timing/number, rather than trying to parse out whether we can identify distinct control and exclosure communities in the LDA. So I guess I'm looking at this as an interesting exercise in learning about how LDAs work in this context, but not so much as the necessary-next-step in clarifying our analysis.

I'd be inclined to explore adding treatment as a covariate in the LDA out of curiosity, but continue to focus on the results from the two separate LDAs + changepoint model with all weights = 1. If we're on board with this, I think I'd have a go running the changepoint for more iterations on the hipergator, to try and get better resolution on the timing of the changepoints.

emchristensen commented 6 years ago

I agree on all fronts here.

While Renata is working on the hipergator, I'll get to work writing it up based on what we have already from the separate LDAs. Just FYI, unless anyone has any objections this project will turn into my 3rd chapter, since the other project I was hoping would be a chapter (LDA on plant data) is really not working out and I'm defending in less than 2 months.