Open evanbooher opened 2 years ago
@kalebentley I ran 2017 Kalama steelhead data through the latest version of the script today. I added an additional top level parameter so the user can choose between direct / indirect census expansion bias term ratios. Given your familiarity with these data, could be useful to get your eyes on the output for an initial check on the estimates and associated variance. Nothing time sensitive though. Thanks!
Hey @evanbooher, Reiterating what I said the other day via Teams, I really appreciate the effort you've put into this repository and taking the time to adapt the code to [hopefully] accommodate data sets where anglers were directly surveyed during index effort counts.
I took a quick look back at my files for old/incomplete analyses and was able to find a folder that contained "preliminary" estimates of catch for "Steelhead_Adult_UM_Released" in the Kalama R. in 2016-17. Unfortunately, it looks like I summarized the data slightly differently. Specifically, I think the plot below shows weekly catch from May 1st, 2016, through April 30th, 2017, by angler type (bank, boat) and survey section (1, 2, 3). I have a whole slew of summary files but if I add up the weekly mean catch across the entire year I got 979.
As I was writing the last sentence, I remembered that I have an Excel file that has been sitting on my desktop screen for almost 3 years that has summarized estimates of catch I derived for "Steelhead_Adult_UM_Released" in the Kalama R. (in two different years) using several different methods including our state-space model, the point estimate model (aka "deterministic") using both weekly and montly summarizations, and an "expanded CRC" catch estimate method. I've pasted the file below. As you can see, the estimates are the same general "ball park" but can be decently different on a relative basis.
Anyhow, this all has me chomping at the bit to dive back into these datasets and get estimates generated once and for all. Sadly, I don't know that this will happen in the very near future but your efforts on this repository are definitely getting me excited.
Let me know if you want to chat sometime soon.
@kalebentley I had forgotten about the YearGroup grouping variable that you've used to bin dates for summarized steelhead data! This is good reminder go back and look at the different "filter types" in the SkagitAnalysis.Rmd. I'll give these some thought, but am thinking that I can reuse filter patterns already in place to incorporate more of these options into the script.
One question on the data used the generate the figure you attached - given that those weeks are relative to the YearGroup, do those rows also contain date information (week / month) that associates them with a standard calendar?
Thanks for sharing the results comparison file - I've been meaning to set something similar up to compare output between the BSS and PE scripts for the Skagit fall salmon data. Once I incorporate some further changes to the PE script I'll set it up to match dates you specified above and we'll see how the output compares to previous estimates. After getting through that I think we'll be in a good place to reconvene and discuss?
@evanbooher -- it looks like I did create a "crosswalk" table for dates (see below) but hadn't joined it to the summarized results. There's obviously several ways you could link the model output to the date crosswalk table.
Let me know when/if you want to chat about this some more.
@kalebentley Hey Kale I ran the PE script with those start and stop dates and am coming up a bit short for the total catch estimate at 761, but the effort estimate is very close at 89,876.
So it may be helpful to first make sure that the data inputs are identical - does it sounds right that there are 129 raw total encounters of "Steelhead_Adult_UM_Released" in this YearGroup?
Then after confirming this, a next step could be running the scripts side by side to see where divergence in the methods / calculations may be occurring. This may be easier for you to run on your machine given familiarity with your older code and database connections, but if you don't have time then at least a chat to get myself oriented to your older scripts would be good.
A rough eyeball comparison of the weekly estimates above to this figure pasted below does show that the estimates from the PE script are underestimating catch (relative to other methods).
Though I wish that the final values just aligned and all was hunky dory, this is serving as a good test of the script to see where there may be mistakes in the code.
Hey @evanbooher, Thanks for the call. So that we have a record of what we just talked about I figured I would quickly respond to your last post.
I went back to my analysis files and was able to track down a summary table where I summed the number of interviews and catch of "Steelhead_Adult_UM_Released" by weeknum (1-54), angler type (B, S), and section (1-3). For this specific YearGroup (2016-17), my summary had 131 "encounters". While it would be good to know why we are off by two fish, I would say that's pretty good.
As you stated, given that our total effort estimates are nearly identical, the difference in our total catch estimate has to be due to some difference in how we are summarizing and/or expanding CPUE. Once you are done proofing your CPUE script section, I'd be more than happy to go through our separate scripts line-by-line to figure out exactly where our estimates diverge.
Lastly, as promised, I'd attached two zip folders that contain various data summaries and model outputs for us to reference, as needed.
@kalebentley Thanks Kale, I think that summary table you described will be helpful for exploring differences between the summarized data I'm working with in the script and the existing Kalama estimates.
After our call I combed back through the daily catch summaries and there were four instances of catch without quantified effort (from index counts)! So that's an important issue to correct in the PE script. Will follow up when once I get to another good benchmark on this.
Hey @kalebentley - I found one point at which the scripts diverged and led to different final catch estimates. I was using the fishing_end_time column from the interview table, which appears to contain some erroneous values, relative to the interview_time (many fishing_end_time's are hours later than the interview_time).
When I used the interview_time as the fishing end time for incomplete trips, it appears that the group anglers hours, and thus CPUE calculations, come into general agreement with previous analyses, with a total catch estimate of 1,074 for Steelhead_Adult_Unmarked_Released. Woo! While there's further scrutiny to work through on the script, this was a good issue to uncover. I'll check it against newer datasets where I've been using fishing_end_time to define angler hours without first verifying that the hours made sense relative to the interview_time.
@dannyjwarren - Not sure if those weird fishing_end_time's reflect normal behavior of the database, or are an issue to address?
@evanbooher and @kalebentley Not sure about the exact circumstance here with those times, but some projects would occasionally ask anglers when the angler(s) thought they would finish fishing if the trip was still in progress (incomplete trip). The sampler would put that projected value in the end time field. Thus, for an incomplete you need to use the interview time. Sounds like you did that and roughly got what was expected. Let me know if it's still worth investigating.
@dannyjwarren, no need for further investigation on your end of things, thanks!
@kalebentley - in regards to the "obtained interview and recorded non-zero catch, but observed no corresponding index effort" scenario for a given survey day, in your past estimates did you assume that expanded catch were not estimable at a weekly time step (when weekly mean index effort values are also zero), but could be used at a coarser time strata (monthly)? Seems like another scenario where there are a host of reasonable ways to approach this issue. Maybe easier to talk through this one on a call some point? Thanks!
Hey @evanbooher, Without going back through my original code to verify and/or spending a little more time thinking this through, my initial thought would be to keep the equation the same with no additional "interpolation". For example, if you see 0 anglers during all of the index counts for a day (and assuming there isn't non-zero effort estimated at the y-intercept from the tie-in count expansion bias term) then the estimate of catch for that day would be zero even if you interviewed an angler AND that angler caught a fish. This is just the reality of estimates -- some days you'll over-estimate and others you'll underestimate but on average you'll get an unbiased estimate (assuming your the assumptions of your model are valid). This is really no different than than when you over-estimate effort/catch by conducting the index counts during the busiest part of a day (which undoubteldy happenes on some days due to the randomized schedule). However, if this was happening a lot, and there was a lot of observed catch in interviews but no anglers being observed (directly or indirectly during index effort counts) would indicate that the study design needs to be adjusted.
All that said, I am actually curious how the Bayesian model deals with these instances given that @tbuehrens updated the model so that the estimates of effort are for what isn't "observed" from the interviews. That last sentence was a terrible job explaining Thomas' approach but I think he'll understand what I was trying to say and could comment if he wanted.
I'm in an all-day training today and tomorrow but could chat on Friday if you wanted to discuss further.
Side note - I'm loving how we are using these Issues thread. Super helpful! Thanks, Evan!!
Kale
Thanks @kalebentley for the insight and for bringing this issue back into the context of the estimation method. Also super helpful! Overall, these instances of reported non-zero catch with zero corresponding effort should be pretty rare, but exceptions to our rules are bound to occur every so often!
@evanbooher @kalebentley should this close given the progress made or worth keeping open due to still-pending work to align protocol, database fields and analysis script(s)?
Point Estimate script was originally written using index counts of vehicles and trailers - the thread below documents work towards incorporating direct counts of anglers as an index count type to the script, using Kalama River steelhead creel data from 5/31/2016 - 4/30/2017 as a test dataset.