Size composition - Githubissues

smartell commented 9 years ago

Input size composition sums to 1 for each year for both sexes combined and shell condition in the survey. The code assumes each line in the input data is independent. Need to decide what to do on this matter.

jimianelli commented 9 years ago

One approach would be to add a flag in which the practitioner specifies whether or not to normalize wrt each line. In cases where the flag indicates to normalize over all sexes and shell conditions, the sample sizes could either be summed or just use the first one.

wStockhausen commented 9 years ago

Normalizing DATA is a quick calculation which can be done once, so you don't really need to specify whether you need to normalize or not--just do it (as the old Nike commercial said). That said, you might need to specify the TYPE of aggregation/normalization by line (by sex, by shell condition, etc.).

Also, what about maturity? Male maturity characterization may be available (it is currently for survey data in the Tanner model, but it's based on an ogive), but female maturity certainly is for Tanner/snow crab (again, in the survey data).

Dr. William T. Stockhausen *
Resource Ecology and Fisheries Management *
Alaska Fisheries Science Center *
National Marine Fisheries Service *
National Oceanic and Atmospheric Administration *
7600 Sand Point Way N.E. *
Seattle, Washington 98115-6349 *
email: William.Stockhausen@noaa.gov *
voice: 206-526-4241 fax: 206-526-6723 *
web : http://www.afsc.noaa.gov *

All models are wrong, some are useful.--G.E.P. Box Beware of geeks bearing equations. --W. Buffett

Disclaimer: The opinions expressed above are personal and do not necessarily reflect official NOAA policy.

On Tue, Jan 13, 2015 at 6:48 AM, Jim Ianelli notifications@github.com wrote:

One approach would be to add a flag in which the practitioner specifies whether or not to normalize wrt each line. In cases where the flag indicates to normalize over all sexes and shell conditions, the sample sizes could either be summed or just use the first one.

Reply to this email directly or view it on GitHub https://github.com/seacode/gmacs/issues/53#issuecomment-69754543.

smartell commented 9 years ago

The issue as I understand it, is that you want to normalize over sex & shell condition, and perhaps even maturity for terminal molt, so that the composition information is informative about recruitment and mortality, but also the sex ratio and ratio of new to old shell conditions, etc.

Buck also made the point above.

quantifish commented 9 years ago

So what I have done here is add another line to the control file under the size comps section

 1   1   1   1   1   1   1   1   1   # Type of likelihood
 0   0   0   0   0   0   0   0   0   # Auto tail compression (pmin)
-4  -4  -4  -4  -4  -4  -4  -4  -4   # Phz for estimating effective sample size
 1   2   3   4   4   5   5   5   6   # Composition aggregator

This vector has been named the Composition aggregator and it allows the user to specify any combination of size compositions that you can imagine and these size comps will be aggregated (i.e. stuck together horizontally), normalised by row, then poked into the appropriate likelihood along with the model predicted vector made up of the same combination.

For example, the snippet above is taken from the bbrkc/TwoSex example. Here we have 9 different size comps (pot retained males, pot discard males, pot discard fem, trawl bycatch male, trawl bycatch fem, survey male newshell, survey male oldshell, survey fem, BSFRF). The Composition aggregator is specifying that the male and female trawl bycatch size comps and the survey comps should be fitted together in the multinomial (or whatever).

These changes are in the size-comp-changes branch. I am testing the code at the moment and once I am happy I can't break it I will merge into the develop branch. Does this all sound like a good approach?

quantifish commented 9 years ago

I did runs with the old version and the new version that is aggregating size-comps for some speed comparisons and testing the differences:

old: --Runtime: 0 hours, 1 minutes, 0 seconds --Number of function evaluations: 1805

new: --Runtime: 0 hours, 0 minutes, 59 seconds --Number of function evaluations: 1805

jimianelli commented 9 years ago

yep, seems quite reasonable approach, sorry for delay in responding.

On Wed, Jul 15, 2015 at 2:59 PM, Darcy Webber notifications@github.com wrote:

So what I have done here is add another line to the control file under the size comps section

1 1 1 1 1 1 1 1 1 # Type of likelihood 0 0 0 0 0 0 0 0 0 # Auto tail compression (pmin) -4 -4 -4 -4 -4 -4 -4 -4 -4 # Phz for estimating effective sample size 1 2 3 4 4 5 5 5 6 # Composition aggregator

This vector has been named the Composition aggregator and it allows the user to specify any combination of size compositions that you can imagine and these size comps will be aggregated (i.e. stuck together horizontally), normalised by row, then poked into the appropriate likelihood along with the model predicted vector made up of the same combination.

For example, the snippet above is taken from the bbrkc/TwoSex example. Here we have 9 different size comps (pot retained males, pot discard males, pot discard fem, trawl bycatch male, trawl bycatch fem, survey male newshell, survey male oldshell, survey fem, BSFRF). The Composition aggregator is specifying that the male and female trawl bycatch size comps and the survey comps should be fitted together in the multinomial (or whatever).

These changes are in the size-comp-changes branch. I am testing the code at the moment and once I am happy I can't break it I will merge into the develop branch. Does this all sound like a good approach?

— Reply to this email directly or view it on GitHub https://github.com/seacode/gmacs/issues/53#issuecomment-121762037.

Jim Ianelli

jimianelli commented 9 years ago

well it ran on my old old windows desktop at 2 minutes 15 seconds...

On Wed, Jul 15, 2015 at 5:19 PM, Darcy Webber notifications@github.com wrote:

I did runs with the old version and the new version that is aggregating size-comps for some speed comparisons and testing the differences:

old: --Runtime: 0 hours, 1 minutes, 0 seconds --Number of function evaluations: 1805

new: --Runtime: 0 hours, 0 minutes, 59 seconds --Number of function evaluations: 1805

— Reply to this email directly or view it on GitHub https://github.com/seacode/gmacs/issues/53#issuecomment-121787150.

Jim Ianelli

quantifish commented 9 years ago

Thanks Jim, no worries on response time. Glad that it's running on windows too. I've got a few minor details to work at now, when aggregating to blocks of size comp data, it currently relies on these being the same dimensions across the same years - the way forward could be to (1) throw an error if this is not the case or (2) allow for different dimensions/years between blocks (this would need to cross check all of the years in one block with another block and then fit multivariate distributions of different dimensions in different years).

quantifish commented 9 years ago

Checks I have done

[x] Run old and new and check that speeds are similar (i.e. that my code is not inefficient)
[x] Run original two sex model and new twosex with the composition aggregator 1 2 3 4 5 6 7 8 9. This should yield exactly the same result as the old twosex model.
[x] That an error is thrown if two size-comps are specified to be aggregated but they are of different dimensions
[x] That an error is thrown if two size-comps are specified to be aggregated and they are of the same dimensions but one of the years is different
[x] manually check the input files against the output files for 2 examples of merged size-comps

quantifish commented 9 years ago

I think this one is done now so gonna close this issue and merge with develop

seacode / gmacs

Size composition #53