rBatt / trawl

Analysis of scientific trawl surveys of bottom-dwelling marine organisms
1 stars 0 forks source link

Do substrata vary in effort? #84

Closed rBatt closed 8 years ago

rBatt commented 8 years ago

I think it's ~ safe to assume that effort is constant among hauls (in a given region-year). However, if I aggregate hauls into substrata/ strata, then I'm probably altering the amount of effort put into various data points. For biomass, this effort is somewhat accounted for by doing the good-ol' CPUE, because I can take the average of the biomass of each species. However, the denominator is irrelevant when converting the average CPUE's into 1's and 0's for the MSOM – the number of species present will just go up!

I should probably introduce N as a detection-level covariate, where N is the number of hauls per data point (data point entering the MSOM model).

To do this, should probably add a sum(length(unique(lat)) (but see note) somewhere in these lines. Note: need to be careful with how I do this, because there could replicates here not just because of aggregating across hauls, but aggregating across species id's that refer to the same species. Also, be careful because this definition of haulid is just the definition of substratum (basically), and masks which rows come from different hauls. Might be a good idea to do a haulid2 that is based on the lat-lon w/o rounding them.

rBatt commented 8 years ago

This is almost certainly an issue. A possible solution might be to cave and use hauls as the replicates.

When I explored using hauls as the replicates (instead of substrata), I remember being scared out of it by the highly skewed distribution of the number of hauls per stratum per year. I need to fix this.

mtingley commented 8 years ago

Hmmm... I'm not really sure what you're talking about, but I'm happy to chat it over.

On Wed, Sep 23, 2015 at 9:26 PM, Ryan Batt notifications@github.com wrote:

This is almost certainly an issue. A possible solution might be to cave and use hauls as the replicates.

When I explored using hauls as the replicates (instead of substrata), I remember being scared out of it by the highly skewed distribution of the number of hauls per stratum per year. I need to fix this.

— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawl/issues/84#issuecomment-142777413.

rBatt commented 8 years ago

@mtingley My replicates have non-constant effort. The "replicates" potentially (occasionally) have varying numbers of net sweeps (transects). I'm assuming this is a bigger problem for presence/ absence data than it is for catch-per-effort data.

Looking back, it sounds obvious that my approach was a bad idea. I'm not sure how big of a discrepancy there is (in most cases this could be a non-issue, actually; there's just the potential for it to be a problem), but my alternatives aren't real simple.

I will figure it out though.

rBatt commented 8 years ago

Yeah that was a problem, probably. I don't know the impacts of that assumption. In the least, I should have modeled things as binomial, and account for K and had a size value.

I've reorganized the handling of the data, so this is handled more explicitly and more cleanly. Currently re-programming the msom, and we'll see if I can handle the potentially-large number of samples per site-year, which will reduce the need for aggregation, and avoid this pitfall. Otherwise, will have to include the number of samples.