Investigate median widths (bed, section, region) for surface surveys

grinnellm commented 4 years ago

From @jaclyncleary: For the surface survey index calculations where the widths are replaced with mean width from five observations in the same area, is it possible for you to look at/pull out a couple examples of this happening? How variable are the dive width observations in a given area (over which this is averaged)? And similarly, how variable are the surface width observations in the same area?

grinnellm commented 4 years ago

I'd like some suggestions on how to approach this before I start. This is the relevant paragraph in the tech report:

grinnellm commented 4 years ago

I was thinking about a 3-panel plot, which I could create for each "pool". (@jaclyncleary if you know how the pools came to be, I'd like to know the background.) (a) Histogram of observed surface spawn widths in the pool, showing the median pool width, median section width, and the median region width as vertical lines. (b) Histogram of observed understory spawn widths (transects) within the pool. (c) Histogram of observed Macrocystis spawn widths (transects) within the pool. I'm not sure if this one would be useful, since the Macrocystis survey protocol is a bit different. Note: I decided to scrap (c) after consulting with MT.

grinnellm commented 4 years ago

Writing this idea for later.. if we don't know where/how the Pools came to be, an idea for making this repeatable and more transparent would be to omit them from the calculations. We could instead use the mean width from dive surveys in that Location, Section, Statistical Area, or Region (in that order). It would be easy to re-calculate these mean widths every year.

grinnellm commented 4 years ago

Here's an example plot for WCVI, showing observed surface spawn widths as black dots, median pool widths as green dots, median section widths as blue dots, and median region width as a red line. This is just a small sample of the data to see if this display format works -- what do you think? There wasn't really enough data for histograms or boxplots, but these might work with the full data set, I'll take a look. I haven't started on (b) and (c) yet.

grinnellm commented 4 years ago

Same plot for WCVI, this time with all the data. PoolWidth

grinnellm commented 4 years ago

Latest version. I think this might give us the info we need, and we can run it for different areas. It's a busy plot, so small areas work better (i.e., one Stat Area at a time, like in this example). Red boxplots indicate spawn width for understory surveys (dives), and teal boxplots are surface surveys. Green dots are median pool widths, blue lines are median section widths, and red lines are median region widths. I still need to make sure this is correct. What do you think about the format/presentation? I think Pool "99" means unknown, but I'm not sure yet. PoolWidth

grinnellm commented 4 years ago

Some more background on Pools (Schweigert et al. 1993). Read pages 5 and 6 (Modified escapement model, and Analysis of width data). SchweigertEtal1993 - Herring spawn index analysis.pdf

grinnellm commented 4 years ago

MT agrees that Pool code "99" means unknown. In addition, MT says the Pool codes are not maintained. So, new Locations are created without Pool info. As far as he knows, there is no look-up table describing the Pool codes and how to apply them. Sounds like this information was not passed on from whoever started the Pool work.

grinnellm commented 4 years ago

The following table shows the "level of spatial aggregation" used to fill in surface spawn survey widths for each SAR. For example, 95.8% of spawns (i.e., spawn numbers) in HG get their width as the median width from understory dives within the Pool, 4.2% of spawns use the median Section width, and 0% use the median Region width. This is the way things are currently done. I know it's a bit silly to make groups with only 7 observations, but it seems like there are three main groups of SARs:

HG, PRD, and A2W get over 90% of widths from Pools,
CC, WCVI, and A27 get about 15% of widths from Sections, and
SoG gets about 15% of widths from Sections, and 12% from Region.

SAR	Pool	Section	Region
HG	95.8	4.2	0.0
PRD	95.7	3.0	1.3
CC	85.9	14.1	0.0
SoG	73.5	14.9	11.6
WCVI	82.8	14.4	2.8
A27	80.3	14.4	5.4
A2W	90.0	5.1	4.9

Compare this to a potential alternative for filling-in surface survey widths that doesn't rely on Pools. If there are understory dives in the same Location, use the median width from dives at that Location, otherwise use the median width from dives within the Section, then dives within the Statistical Area, then dives within the Region. Again there seems to be 3 groups:

CC and A2W get about 90% of widths from Locations,
HG, PRD, WCVI, and A27 get about 20% of widths from Sections, and
SoG gets about 25% of widths from Sections, and 12% from Statistical Areas.

WIth this method, no Region widths are used, although this could be a bit deceiving because some of these SARs only have one Statistical Area (so the Region and Statistical Area width are the same).

SAR	Location	Section	StatArea
HG	79.0	21.0	0.0
PRD	78.6	20.1	1.3
CC	89.5	10.5	0.0
SoG	63.1	25.3	11.6
WCVI	75.3	21.9	2.8
A27	77.2	17.5	5.4
A2W	88.3	11.7	0.0

If there is interest in pursuing this alternative method, I could compare the widths based on the current method vs the alternative. Or, I could compare the spawn index based on the current method vs the alternative.

grinnellm commented 4 years ago

Email from Jake S on this subject:

I've had a look through the Github and found the plots pretty interesting. Seems there isn't a lot to choose between beds/pools and locations in terms of adjusting historical surface widths. I also went and had a look at the SurfaceEggsandFishCompile table in the spawn.mdb file. It looks to me like '99' indicates a bed that just hasn't been assigned a code yet so I suppose it is unknown in that sense. It looks like they are mostly in areas that are either aren't surveyed regularly or at least haven't been dive surveyed so there wasn't any data that could be used to define the widths for that bed.

As far as the beds go, our thinking at the time was that the physical extent of locations was poorly described and often confusing, in other words, different people used the same location name for two different physical locations or used different names for the same location. I think that is not uncommon in the historical surface survey database. Therefore, we felt that it made more sense to develop the beds that described a known physical stretch of beach that was similar in its geomorphology and should have a similar width of vegetation. In most instances, this would also encompass a number of locations and so would partly get around this issue of inconsistent use of a location name for a particular physical location. In the end, it probably doesn't matter too much on which approach is used to map dive width to surface width, both will be wrong but the use of beds with at least an underlying physical mapping to the geography seems more accurate than applying the more nebulous location.

Jake also mentioned that he would ask Doug Hay to elaborate on the fuzziness in Location names.

grinnellm / SpawnIndex

Investigate median widths (bed, section, region) for surface surveys #9