Stats_dec_2022 - Githubissues

zenrabbit commented 1 year ago

Hi Chris, I'm still stuck. The main idea we decided was to evaluate the impact of fires on species recovery.My main model function are:1) to compare occurrences reported before and after fire between the two treatment groups (burned vs control); the year of the reported observations, desert, and taxa are included as random effectsstzdObs ~ pre_or_post + treatmentGroup + (1|obsYear) + (1|desert) + (1|simplified_taxa)

this is a brutal model. very complex and beyond what I usually do. Remove year effect, test mean(stdzObs) pre and post thereby reducing one model term.
How many taxa are there? I thought were doing birds only. then all other taxa separate?This further UP the power of model by dropping taxa term.
I would not treat taxa as random. I know, we did not 'pick' them or control them, however, their variation is likely not random but fixed. 2) compare occurrences reported to the number of years since fire. stzdObs ~ yearsSinceFire + treatmentGroup + (1|obsYear) + (1|desert) + (1|simplified_taxa)
again simplify model by at least removing taxa and just run out birds, then all others? We know strongly unbalanced design What I'm struggling with are the following issues with the data:1) due to the structure of the data; that number of occurrences reported for birds versus other taxa differ in scale and distribution.
treat birds separately
2) Not all taxa have occurrences in the burned treatment group. I could address this by just grouping them into broader groups (e.g. aves and 'other') like I did above, but I think it would be inappropriate all 'other' taxa should be treated as one group considering they vary across animal classes
yes good call.
OR test only found in both

3) Not all simplified taxa are observed in all three deserts.

I'm struggling with how to treat this data. What I have been doing is not filtering out the taxa with no data. But I think this creates a problem when I try to fit a model.I've been trying to fit a zero truncated negative binomial distribution (since there is overdispersion, but no zeros and overdispersion) glmm. But run into issue validating the models.

Marina

zenrabbit commented 1 year ago

Questions from M

I will try to rerun the model with your suggestions.

Stephanie suggested including the observation year as a random effect, since we're treating it more as a sampling event than an actual variable.

Seems I misunderstood when we were talking about how to model the different taxa. Will just look at endangered birds first. What kind of consideration should I give to different avian families (e.g. passerines, raptors, woodpeckers, etc.) having different population sizes (i.e. a lot more passerines than raptors)?

Should I only test groups that are found in BOTH control and burned sites AND all three deserts?

Best, M

zenrabbit commented 1 year ago

Replies

the simpler 'data model' you can build, the simpler stats models with more power. I will try to rerun the model with your suggestions.
your call, whatever you want. your model is amazing too - do not worry about the model diagnostics too much. I just prefer simple data models. Stephanie suggested including the observation year as a random effect, since we're treating it more as a sampling event than an actual variable.
seems too messy to me. different years, different durations, I would simplify time out in data model to " mean after 5 years from a fire" in control sites and pairs burnt sites, only in shared occurrences that both site-types have

Seems I misunderstood when we were talking about how to model the different taxa. Will just look at endangered birds first. What kind of consideration should I give to different avian families (e.g. passerines, raptors, woodpeckers, etc.) having different population sizes (i.e. a lot more passerines than raptors)?

ah, I misunderstood. sorry. I would keep high-level for now. I misunderstood what you meant for taxa. I think this is genus? I would keep simple for sure. whatever ecology you can justify so just do ES of birds for now, see what you got with power, n, and outcomes. I thought you did this?

Should I only test groups that are found in BOTH control and burned sites AND all three deserts?

seems like a good starting point

zenrabbit commented 1 year ago

OPTIONSa. accept no model is perfect, move on, summary stats, highlight post hoc contrasts for key differences in paper to do this: table 1, main summary effects with p-values, df. then when main or interaction effect sig, do a post hoc contrast, even a simpler one without random models, ie like t-tests for key differences, and cite (table 1, main model stats with post hoc contrasts) in results b. build a much simpler data model - ie drop year, just take mean or cumulative mean for 5 years post fire, in species/feeding functional groups there were a site before fire, 5yr window too, and do stats on differences between burnt and control. so - stats are testing - is the mean 5-yr difference in functional groups of ES birds that were present before a fire and after different from control sites. Seems awesome. OR any simple data aggregation to remove random factor of year - that we do not need same idea for functional group - either ignore, and then explore individual ff or species in individual models or just make this a paper about ES bird changes - and move some nice plots of species, histograms to supplement.

zenrabbit commented 1 year ago

found this https://www.statology.org/glm-fit-algorithm-did-not-converge/

plus attached pdfhttps://www.geeksforgeeks.org/how-to-fix-in-r-glm-fit-algorithm-did-not-converge/ RJournal_2011-2_Marschner.pdf

zenrabbit commented 1 year ago

Haven't given the emmeans much consideration yet. I'm still unfamiliar with this tool. From my understanding it shows a significant difference in both post- and pre- of control and burned sites AND it shows that there is a difference between the control and burned sites.

yes, correct.

I'm still unsure of what arguments to use in emmeans. I'm either finding examples for more simple models, or examples that are difficult [for me] to follow with minimal explanation.

like a t-test for the se associated with each mean of each level of factor that you set up

Also, I'm not sure why the estimates for contrasts are the same (screenshot below). If either of you can explain this that would be great. I am planning on working more on this over the next week.

cannot compute, had some thing many times. Not much way around it, iterations to do this: " a. The emmeans() package automatically adjusts for multiple comparisons. b. In models with covariates, EMMs are often called adjusted means. The emmeans function computes EMMs, accompanied by standard errors and confidence intervals.

zenrabbit commented 1 year ago

making sure first you’re using glmmTMB (it is more robust for complex models without losing accuracy like some others) if you are, make sure you are using the developers version of glmmTMB from github (instead of CRAN), I’ve found installing this version alone can solve convergence issues. If you can’t solve the convergence, try running the model with a different optimizer. If you get the same results you can be confident in trusting them.

mgoldgisser / desert-fires-impact-biodiversity

Stats_dec_2022 #5

Questions from M

Replies