kaitlyngaynor / gorongosa-mesocarnivores

2 stars 0 forks source link

what models to run? #118

Open klg-2016 opened 3 years ago

klg-2016 commented 3 years ago

I believe I am now at a point where I can actually run some models. The data may be too rough to actually pull anything, but I think I can play with it some and see what's there.

We've talked about different methods for building up a model (I'm thinking when we talked about starting with a single variable and adding, versus starting with everything and subtracting). I also just looked back at our notes from the meeting with Kendall and he said they used forward selection for detection and then occupancy, and then held those static for extinction and colonization formulas.

I think I'm going to try to follow that general format and see what happens. I don't have any specific questions right now, but if you have any general guidance for this step please let me know! Otherwise I'll just keep you posted

kaitlyngaynor commented 3 years ago

exciting!! I'd start forward, and keep in mind that given the complexity of the models and the small camera sample size, you'll likely have issues with convergence if they get too complicated. so try out some super basic models & we can see what shakes out!!

On Fri, Mar 26, 2021 at 10:55 AM klg-2016 @.***> wrote:

I believe I am now at a point where I can actually run some models. The data may be too rough to actually pull anything, but I think I can play with it some and see what's there.

We've talked about different methods for building up a model (I'm thinking when we talked about starting with a single variable and adding, versus starting with everything and subtracting). I also just looked back at our notes from the meeting with Kendall and he said they used forward selection for detection and then occupancy, and then held those static for extinction and colonization formulas.

I think I'm going to try to follow that general format and see what happens. I don't have any specific questions right now, but if you have any general guidance for this step please let me know! Otherwise I'll just keep you posted

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaitlyngaynor/gorongosa-mesocarnivores/issues/118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHA7WT4KIM4B7V44S5ALIQ3TFTDCNANCNFSM4Z32WRPA .

klg-2016 commented 3 years ago

Sounds good! I started a spreadsheet in the shared folder to keep track of model output

klg-2016 commented 3 years ago

okay, preliminary results are in. I've put everything in this spreadsheet: https://docs.google.com/spreadsheets/d/1P6WaMEbk8YxRD8QK_ZiDNihb5Deetl8ORkyS3BmENmY/edit?usp=sharing

I started building models for genets, civets and honey badgers (3 of the 4 focal species from my thesis; I didn't do anything with marsh mongooses because the mongoose data from 2019 hasn't been gone through, so right now it only has "mongoose").

Civets show interesting results, with "dog" as a helpful covariate for the extinction formula. They also have two occupancy covariates that help improve the model

For genets, "dog" is better than "year" for extinction and colonization, but the best model I built today only used detection covariates and everything else was constant.

And for honey badgers, "dog" was again useful for extinction, "year" was helpful for colonization (very different from the other two), and there were no helpful occupancy covariates from the ones I tested (which surprised me, because they had a pretty solid relationship with distance to lake from my thesis, though I realize that's only from one season--oh! if year is a significant factor for them, does that make it make sense if the multi-year and single season results are different?)

I'm going to take a break from it for a bit, but I'm thinking next I'd like to test out a few other species and see what comes out. I'm also wondering if I should pull in some other potential occupancy covariates that I discarded from my thesis, to see if there are things that might not have been so important in the first season but do have an effect across seasons--thoughts?

kaitlyngaynor commented 3 years ago

Cool!! Nice job getting the models running! this is a big step :)

I just heard from Paola today that the wild dog paper was accepted, so they'll be sharing the official data soon. I think that will be huge in helping to make this analysis more realistic.

I'm still trying to wrap my head around what "year" as a colonization/extinction covariate signifies... can you explain it?

I am wary of including too many different covariates in the model, given the limited number of cameras, but I do think that perhaps a bit more complexity in the models may be feasible and wise.

A note about detection covariates: I would actually think a bit critically about using cover.ground for multiseason models, because I took those measurements in 2016 and they only really applied for that first season in

  1. Ground cover was VERY different in later years due to local greenup/fire/flooding/etc dynamics and I never recorded it again. Have you looked into models with just detect.obscured as a covariate? I think that might be more appropriate.

just some initial thoughts... happy to continue to revisit this!

On Fri, Mar 26, 2021 at 1:07 PM klg-2016 @.***> wrote:

okay, preliminary results are in. I've put everything in this spreadsheet: https://docs.google.com/spreadsheets/d/1P6WaMEbk8YxRD8QK_ZiDNihb5Deetl8ORkyS3BmENmY/edit?usp=sharing

I started building models for genets, civets and honey badgers (3 of the 4 focal species from my thesis; I didn't do anything with marsh mongooses because the mongoose data from 2019 hasn't been gone through, so right now it only has "mongoose").

Civets show interesting results, with "dog" as a helpful covariate for the extinction formula. They also have two occupancy covariates that help improve the model

For genets, "dog" is better than "year" for extinction and colonization, but the best model I built today only used detection covariates and everything else was constant.

And for honey badgers, "dog" was again useful for extinction, "year" was helpful for colonization (very different from the other two), and there were no helpful occupancy covariates from the ones I tested (which surprised me, because they had a pretty solid relationship with distance to lake from my thesis, though I realize that's only from one season--oh! if year is a significant factor for them, does that make it make sense if the multi-year and single season results are different?)

I'm going to take a break from it for a bit, but I'm thinking next I'd like to test out a few other species and see what comes out. I'm also wondering if I should pull in some other potential occupancy covariates that I discarded from my thesis, to see if there are things that might not have been so important in the first season but do have an effect across seasons--thoughts?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaitlyngaynor/gorongosa-mesocarnivores/issues/118#issuecomment-808480833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHA7WT65AO2CWISAN6VN7ZTTFTSONANCNFSM4Z32WRPA .

klg-2016 commented 3 years ago

Cool!! Nice job getting the models running! this is a big step :) I just heard from Paola today that the wild dog paper was accepted, so they'll be sharing the official data soon. I think that will be huge in helping to make this analysis more realistic.

Thank you! It was nice to run some without obvious errors. That's really exciting about the wild dog data--I agree about the realistic-ness. Is there an average time between a paper being accepted and available more broadly? I'm wondering if it might make sense to pause on the models for a bit until we have that data, because the models currently aren't really defensible even if something cool comes out with the dog covariate.

I'm still trying to wrap my head around what "year" as a colonization/extinction covariate signifies... can you explain it?

Not well, unfortunately. It seems to me to just be some marker of time passing? Because on its own, it's just a number that increases by one every season. So it's just some way to incorporate years going by? But then I'm not really sure how to finish the statement "It represents the effect of X on extinction probability", for example.

I am wary of including too many different covariates in the model, given the limited number of cameras, but I do think that perhaps a bit more complexity in the models may be feasible and wise.

You mean across the board? Or for a specific part/formula of the model? For the detection, colonization and extinction formulas, we currently only have 2 possible variables to include, so I tried X, Y and X+Y. For initial occupancy, I used the covariates from my thesis, and tried the pairs of covariates that had provided best fit in that project. Very happy to try more complex models as well, just want to see what you're thinking.

A note about detection covariates: I would actually think a bit critically about using cover.ground for multiseason models, because I took those measurements in 2016 and they only really applied for that first season in 2016. Ground cover was VERY different in later years due to local greenup/fire/flooding/etc dynamics and I never recorded it again. Have you looked into models with just detect.obscured as a covariate? I think that might be more appropriate.

OK, I know that the extinction and colonization formulas look at between-year changes, and the occupancy formula only considers the first year. It would make most sense if the detection formula was applied to each year's data, right? (This is me thinking out loud about whether that fact of the cover.ground only being relevant for 2016 matters, and coming to agree with you that it definitely seems to). I did try only detect.obscured for genets, and it was actually a worse model than assuming constant for detection. For civets and honey badgers, my default was both.

If the detection formula happily accepts a single column for the detection covariates (so only one value for each site over the whole study period), then it's assuming any detection covariates you include are constant throughout the study period, right? Which may just inherently be a problem for something like ground cover?

kaitlyngaynor commented 3 years ago

Cool!! Nice job getting the models running! this is a big step :) I just heard from Paola today that the wild dog paper was accepted, so they'll be sharing the official data soon. I think that will be huge in helping to make this analysis more realistic.

Thank you! It was nice to run some without obvious errors. That's really exciting about the wild dog data--I agree about the realistic-ness. Is there an average time between a paper being accepted and available more broadly? I'm wondering if it might make sense to pause on the models for a bit until we have that data, because the models currently aren't really defensible even if something cool comes out with the dog covariate.

It's coming out in PLoS One, so that means it will be a very fast turn-around (a few weeks?) but who knows. Paola's words yesterday "our mabeco paper just got accepted for publication in https://collections.plos.org/call-for-papers/rewilding-restoration/ and we will soon share the link for the data (dogs spatial, lions spatial, diet) we used for the ms. Hope this will help!" I think maybe it does make sense to pause and wait??

I'm still trying to wrap my head around what "year" as a colonization/extinction covariate signifies... can you explain it?

Not well, unfortunately. It seems to me to just be some marker of time passing? Because on its own, it's just a number that increases by one every season. So it's just some way to incorporate years going by? But then I'm not really sure how to finish the statement "It represents the effect of X on extinction probability", for example.

Hmm okay. can keep thinking on this one.

I am wary of including too many different covariates in the model, given the limited number of cameras, but I do think that perhaps a bit more complexity in the models may be feasible and wise.

You mean across the board? Or for a specific part/formula of the model? For the detection, colonization and extinction formulas, we currently only have 2 possible variables to include, so I tried X, Y and X+Y. For initial occupancy, I used the covariates from my thesis, and tried the pairs of covariates that had provided best fit in that project. Very happy to try more complex models as well, just want to see what you're thinking.

Yeah, across the board. I think that what you have is totally fine, but I'm just wary of including too many things like tree cover + fire frequency + lake distance + etc

A note about detection covariates: I would actually think a bit critically about using cover.ground for multiseason models, because I took those measurements in 2016 and they only really applied for that first season in 2016. Ground cover was VERY different in later years due to local greenup/fire/flooding/etc dynamics and I never recorded it again. Have you looked into models with just detect.obscured as a covariate? I think that might be more appropriate.

OK, I know that the extinction and colonization formulas look at between-year changes, and the occupancy formula only considers the first year. It would make most sense if the detection formula was applied to each year's data, right? (This is me thinking out loud about whether that fact of the cover.ground only being relevant for 2016 matters, and coming to agree with you that it definitely seems to). I did try only detect.obscured for genets, and it was actually a worse model than assuming constant for detection. For civets and honey badgers, my default was both.

If the detection formula happily accepts a single column for the detection covariates (so only one value for each site over the whole study period), then it's assuming any detection covariates you include are constant throughout the study period, right? Which may just inherently be a problem for something like ground cover?

I think so... hmm. But if only occupancy is modeled for the first year, I think that maybe only detection is modeled for the first year, too (they are modeled together in a hierarchical way). I actually think that using ground cover would be fine in this case. So maybe ignore this statement?

klg-2016 commented 3 years ago

It's coming out in PLoS One, so that means it will be a very fast turn-around (a few weeks?) but who knows. Paola's words yesterday "our mabeco paper just got accepted for publication in https://collections.plos.org/call-for-papers/rewilding-restoration/ and we will soon share the link for the data (dogs spatial, lions spatial, diet) we used for the ms. Hope this will help!" I think maybe it does make sense to pause and wait??

Fantastic! Very exciting. Sounds good to me.

Hmm okay. can keep thinking on this one.

Please let me know if you have any breakthroughs and I'll do the same!

Yeah, across the board. I think that what you have is totally fine, but I'm just wary of including too many things like tree cover + fire frequency + lake distance + etc

I've now gotten myself confused--are you recommending trying it with more or fewer variables?

I think so... hmm. But if only occupancy is modeled for the first year, I think that maybe only detection is modeled for the first year, too (they are modeled together in a hierarchical way). I actually think that using ground cover would be fine in this case. So maybe ignore this statement?

Maybe something to ask Kendall about? I could also ask if he has any intuition on what the "year" variable represents.

kaitlyngaynor commented 3 years ago

Yeah, across the board. I think that what you have is totally fine, but I'm just wary of including too many things like tree cover + fire frequency + lake distance + etc

I've now gotten myself confused--are you recommending trying it with more or fewer variables?

Hahah sorry.... I think your initial point/question was "I am going to try more variables maybe?" and my response is "okay, a good idea, but not TOO many!" if that makes sense

I think so... hmm. But if only occupancy is modeled for the first year, I think that maybe only detection is modeled for the first year, too (they are modeled together in a hierarchical way). I actually think that using ground cover would be fine in this case. So maybe ignore this statement?

Maybe something to ask Kendall about? I could also ask if he has any intuition on what the "year" variable represents.

Yeah I'd get back in touch with him!

klg-2016 commented 3 years ago

Gotcha, makes sense! I'll keep in mind not to go too complex with the model.

I'll reach out to Kendall and copy you!

klg-2016 commented 3 years ago

with Kendall's response:

regarding "year" as a covariate:

it sounds like it could almost be a representation of the random effect of year

What does that mean? I've done some googling to help but I'm still not sure what a random effect/random variable is.

regarding ground cover:

I think having just the initial ground cover measure is alright as long as there weren't significant changes to these measures across the years observed

We obviously can't include ground cover as a covariate over time, because we don't have the data for more than the first season. It doesn't sound like Kendall is confident that we shouldn't use it if it does change over time (which I don't have an intuitive grasp of, but you said was the case), and I still don't have a strong enough understanding of how the model is actually using it. Thoughts? I'm going to look for a paper that uses this model and read how they describe what it's actually doing and see if that helps

kaitlyngaynor commented 3 years ago

with Kendall's response:

regarding "year" as a covariate:

it sounds like it could almost be a representation of the random effect of year

What does that mean? I've done some googling to help but I'm still not sure what a random effect/random variable is.

A random effect is, in essence, something random that introduces noise in the data that you need to "control for" but aren't necessarily interested in studying in and of itself (those are the "fixed effects"). The idea here might be that there is some interannual variability that drives, for example, lots of extinction in 2017 vs colonization in 2018. But I'd just exclude it as Kendall suggests.

regarding ground cover:

I think having just the initial ground cover measure is alright as long as there weren't significant changes to these measures across the years observed

We obviously can't include ground cover as a covariate over time, because we don't have the data for more than the first season. It doesn't sound like Kendall is confident that we shouldn't use it if it does change over time (which I don't have an intuitive grasp of, but you said was the case), and I still don't have a strong enough understanding of how the model is actually using it. Thoughts? I'm going to look for a paper that uses this model and read how they describe what it's actually doing and see if that helps

I think it's probably fine to just include, if it makes the model better and if the model accords with predictions (i.e., higher ground cover leads to lower detection probability). Otherwise drop it.

kaitlyngaynor commented 3 years ago

Oops didn't mean to close!

klg-2016 commented 3 years ago

Thank you for your random variable explanation! Sounds good, I'll move forward without year in the models.

Also sounds good for ground cover.

So I'm going to pause for a bit now while we wait for Paola's data to improve the dog covariate!