Bias adjust scenarios SSB trajectories appear scaled when compared to Nis' version of the code

cgrandin commented 3 years ago

Which code is correct, Nis' or my code? I think both are.

The OM is highly sensitive to the random recruitment deviations. The two different codes have two different sets of random recruitments in the projection years. Both are drawn the same way, from a normal distribution and so they are equally probable. I looked carefully at both sets of code side-by-side and the only difference is these recruitment values informing the numbers-at-age-0 from 2019 forward. The values are r_y here:

https://github.com/pacific-hake/pacifichakemse/blob/c4af6d0fddf720258ec18255fdc8589863bdefd8/R/run_year_loop_om.R#L43

and R_y here:

https://github.com/nissandjac/PacifichakeMSE/blob/97fe2ce10fccc0adcb0f3daf5206e1ae45e09d4e/R/run_agebased_model_true_Catch.R#L274

I could duplicate his string of random recruitments to produce exactly the same results, but is that not just chasing random numbers? The C.I. on the SSB is large, and covers both cases.

kristinmarshall-NOAA commented 3 years ago

Thanks for looking into this! Your explanation seems plausible to me. Nis's plot used 1000 iterations, while your new plot uses 100, so another test might be to do 1000 iterations with the new code and see if the medians get closer. I just started running that because I'll be away from my computer for a couple of hours.

Another way I tried to think through this was to just generate rec devs samples with different numbers of iterations and look at how the sd of the median stabilizes with increasing the replicates. Of course this doesn't mimic the complexity of the operating model, but I think does show that with 100 iterations without the same seed, we should expect the medians to be pretty different. The sd stabilizes around 1000 iterations.

` n<-seq(100,3000,by=100) #vector of number of iterations nsamp<-50 #number of samples rdev_med_est<-rep(NA,nsamp) #median of the samples sd_rdev_med<-rep(NA,length(n)) #sd of the medians

for(j in 1:length(n)){ for(i in 1:nsamp){

Rdevs <- rnorm(n = n[j],mean = 0, sd = 1.4)
rdev_med_est[i]<-median(Rdevs)

}

sd_rdev_med[j]<-sd(rdev_med_est) }

plot(n,sd_rdev_med, ylim=c(0,0.2)) abline(h=0.05, col='red',lwd=3, lty=2) `

If nothing else, for me this drives home the point that if we present results using 100 iterations we should focus less on comparing absolute values, and more on comparisons among scenarios that are using the same set of rec.devs. And, maybe we should continue runs with 500-1000 iterations for "final" results.

aaronmberger-nwfsc commented 3 years ago

I agree that this is a "small sample size" issue given the 1.4 SD on rec devs and the importance of recruitments for governing the underlying population dynamics. Thanks Kristin for putting together that helpful simulation. I also agree that for "exploratory" runs that use say 100 iterations that as long as the rec devs are exactly the same between scenarios (because the random number seed is exactly the same; this is the case right?) we still should be able to compare them for general insights. The idea being that 100 run iterations could weed out less important runs so that runs of particular interest are then identified for a "full" 500-1000 sim run.

I guess the easiest way to fully confirm the two code sets are equivalent would be to use the same seed set as Nis and the same sim length and you should get identical (or darn near identical pending minor code differences/improvements) results (especially the medians).

This is great diagnostic work Chris and Kristin, and it continues to provide comfort that the MSE is doing what we think it is doing.

iantaylor-NOAA commented 3 years ago

One other factor in the mix here is how we're summarizing the values. The Methot & Taylor (2011) paper was focused on being mean unbiased: "mean recruitment is a better representation of the long-term contribution to the population than median recruitment, because most of the population biomass comes from the numerous recruits in the upper tail of the lognormal distribution."

I'm assuming the time series figures for the bias adjustment scenarios are showing the median across iterations. We could look at the mean across scenarios instead, but I'm not sure that would be right either. Most consistent with the theory might be the the median average. That is, take the mean relative spawning biomass over the last 20 or 30 years within each iteration and then see what bias adjustment value causes that distribution to be centered around 1.0. We should add a "Bias 1.0" case to https://github.com/pacific-hake/runhakemse/blob/master/R/run_mse_biasadjust.R as well.

This same question of how best to summarize the values would come up in the alternative approach to calculate the unfished biomass proposed in #4.

kristinmarshall-NOAA commented 3 years ago

Following up on the 1000 iteration bias adjustment runs. The patterns in the medians are consistent with what Chris was seeing with the 100 iteration runs; a bias adjustment of 0.87 gets you close to SSB0. I had expected the median to drift closer to Nis' previous bias adjustment run where a bias adjustment of 0.5 resulted in SSB0.

Here is the SSB/SSB0 plot:

I think this means we should probably do the test using the same seed values Nis used. Or/and, perhaps another test would be to turn off the rec devs during the projection period just to check that the equilibrium is the same between the old code and new code.

As an aside, plotting these took me a while because I was running into vector memory errors with the large output files ad plot .rds file generated by 1000 iterations (1.5 GB), and I had to figure out how to fix that on my machine.

cgrandin commented 3 years ago

Thanks for running this Kristin, so there are issues with memory when running so many iterations.. The code basically stores everything so I'm not that surprised. Could you start an issue with the details and it gives me somewhere to start to fix that. If it's happening with this scenario it will be worse with the ones with EM outputs stored as well.

I'll look at the recruitment deviations tonight and see if I can replicate what Nis had used.

iantaylor-NOAA commented 3 years ago

I thought the goal of the bias adjustment scenario was to make sure the projected population with no fishing stabilized around B0, so I think catch = 0 was appropriate.

Also, NAreal is new to me, when is it needed instead of NA?

On Wed, Dec 2, 2020 at 9:19 AM Chris Grandin notifications@github.com wrote:

I had catch set to zero for the OM runs for some reason. Once I fixed that (e330011 https://github.com/pacific-hake/pacifichakemse/commit/e33001159281389a0f56af7168425789149f2d88) this is what the 1,000 run biasadjust scenarios look like. It's now scaled under what Nis had.

[image: image] https://user-images.githubusercontent.com/4551051/100907487-3c2cec00-347f-11eb-9614-db6c30a7bb69.png

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pacific-hake/pacifichakemse/issues/2#issuecomment-737374163, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGC7FXQRZJ4KUIPUZS2A7TSSZZJ5ANCNFSM4T4PMXLQ .

cgrandin commented 3 years ago

NAreal is used as a default for some arguments because the function verify_arguments() checks arguments to make sure they are not NULL and that they are of a certain class and length. The point of that is to avoid crashes at random points in the code which you then have to backtrace with browser().

In R, if you do:

class(NA)

It is always logical. If you do:

class(NA_real_)

It is numeric. There are also NA_character_, NA_complex_, and NA_integer_. So when verify_argument() checks the class of the input argument it will be the correct type if it is was not set by the caller. Note that

is.NA()

returns TRUE for any of the NA_* types

One important place where this is utilized is in the plot_timeseries() function for the yr_lim argument. The NA_real_ in that case tells ggplot that the first or last limit should be to the data if it is not set. If you call plot_timeseries(ps, yr_lim = c(2010L, NA)) you will get an error because the "L" casts the NA to NA_integer_. If you call plot_timeseries(ps, yr_lim = c(2010, NA)) it will work fine because the 2010 casts the NA to NA_real_. If you call plot_timeseries(ps, yr_lim = c(NA, NA)) it will fail because NA is inherently logical type. If you call plot_timeseries(ps, yr_lim = c(NA, NA_real_)) it will work. But if you're going to do silly things like that it's better just to leave out the yr_lim argument completely.

cgrandin commented 3 years ago

I think I have resolved this.

The following plot shows the 0.87 bias adjust with 100 runs for various seeds. Included is the exact set of recdevs used in the run from Nis' code. I had to import those because this code cannot generate the same exact series he had despite having the same seed. The reason for that is that my code draws things in a different order, i.e. all recdevs are drawn at the beginning and placed into a vector and Nis was drawing them once at each iteration. To complicate matters, the survey error is also drawn from the same sequence for each year (so the draws are interspersed with the recdevs), so his series of random numbers is different than mine, even with the same seed. What I did to replicate his numbers is run his code, and store every set of recdevs for every run, then import them into my code and overwrite the random sequence I had.

Choice of seed makes some difference, but they are all within a reasonable range, and mind that this is for only 100 runs.

The second figure shows the three biasadjust scenarios now that I have fixed things. They all use a seed of 1. It matches what Nis' code produced more closely now. I could produce a plot showing all of them at different seeds as well if it is useful.

Nis' figure from the document:

aaronmberger-nwfsc commented 3 years ago

Regarding the random number seeds - so for [hypothetical] example if we compared a scenario where survey error was turned off versus a scenario where survey error was turned on - are you saying that we might get a difference in output that is not solely due to the survey error being turned on and off because the random numbers for drawing rec devs will be different as well (i.e., the interspersed part you mentioned)? If so, we need to be careful about this and either 1) fix it or 2) increase our number of iterations such that the random seeds make no difference in terms of summary statistics. The goal should always be #2 so it may not really matter. Good to know though.

cgrandin commented 3 years ago

Yes Aaron that’s correct. I fixed this in my version but that’s how Nis’ version worked.

andrew-edwards commented 3 years ago

Could always set seed straight away then save all the random numbers needed for rec devs, but that's not very elegant or satisfying. Yes, it shouldn't matter if run enough iterations. Good detective work Chris.

I had a related reproducibility problem one time - narrowed it down to library(dplyr) actually using a random number (I must have set the seed before that, which is probably bad practice). So when running the code the first time it uses the first random number in library(dplyr), but the second time it doesn't need that (as library is already loaded), so numbers come out different. Solved for me by Ben Bolker over a beer in a brew pub in Seattle! Good one to be aware of....

cgrandin commented 3 years ago

Interesting @andrew-edwards - I may have come across that issue before as well. I'm not sure how that would affect package code though, as there are no library() calls inside. The required packages must be loaded somehow though. I don't think it's an issue in this case but appreciate the reminder! Thanks

kristinmarshall-NOAA commented 3 years ago

These comparisons are much more similar than before. Nice sleuthing @cgrandin. I think we can now say that we understand why the new code wasn't matching Nis's code, and it can be forced to do so by using the identical recdevs. I'm less certain about the implications for moving forward. Chris showed above that there is not a huge sensitivity to the seed, which is good. But was there a bug that has now been fixed, and now we can move on to the discussion in the other thread about the general approach to bias adjustment? Just trying to clarify if we can close this issue yet...

iantaylor-NOAA commented 3 years ago

Here's a follow-up on my comment long ago on this thread speculating that our method of summarizing (median vs mean, etc.) could play a role in how the bias adjustment plays out: https://github.com/pacific-hake/pacifichakemse/issues/2#issuecomment-731471600.

I just ran the OM-only biasadj scenarios (which was super easy, thank you @cgrandin for the clear README instructions) and then looked at the mean relative spawning biomass over the last 10 years of each simulation. The distributions of these means did not reveal any notable difference from the distributions of individual-year values that have been in the plots for this scenario. Therefore, I no longer think that the method of summary could be a factor here.

I still hope to dig deeper into this issue in the future, but it will be at least a week from now.

iantaylor-NOAA commented 3 years ago

As discussed in our meeting, the specific comparison which led to opening this issue seems to be explained by the sample sizes as noted in the comments above, so we could close it and focus further discussion of this topic on issue #4.

kristinmarshall-NOAA commented 3 years ago

I'm adding one more piece to this discussion that I alluded to in our meeting yesterday, gleaned from plotting the seasonal ssb trajectories. The time of year we use for ssb influences our interpretation of what value of bias adjustment seems appropriate.

In Nis's original work on this, he plots mid-year ssb/ssb0, which Chris compares above with mid-year ssb from the new code - and we've discussed we're satisfied with this. Nis concluded from the comparison of the bias adjustment values to SSB0 from the assessment that 0.5 was the closest match, and that's what we've been using to run all of the scenarios to date.

However, in the equilibrium run with no recruitment deviations, we saw that start of the year SSB was the closest to SSB0. SSB drops throughout the year due to natural mortality. This suggests that comparing beginning of the year SSB to SSB0 in the bias adjustment runs might be a more appropriate test. Here is the plot of the seasonal coastwide ssb zoomed in so you can see the seasonal cycle and ssb0 horizontal line:

Jumping back to the bias adjustment runs, if we instead plot start of the year ssb and compare that with SSB0 from the assessment, we could conclude that the 0.87 value (that's used in the assessment) gets us closest to SSB0.

I think this provides more context for a discussion around the role of bias adjustment in scaling the ssb, and resolves some of the mystery (at least for me) about why 0.87 in the assessment but 0.5 in the OM projections. The beginning of the year ssb plot would allow for a justification for changing to a value of 0.87, if we were to keep the same bias adjustment approach moving forward. However, the SRG raised the concern that we were using the bias adjustment parameter as a somewhat arbitrary tuning parameter, so an alternative approach may be preferable for a number of reasons as Aaron outlines in issue #4.

pacific-hake / pacifichakemse

Bias adjust scenarios SSB trajectories appear scaled when compared to Nis' version of the code #2