noaa-afsc-mace / winter_2024_bogoslof_report

Bogoslof reporting for 2024
0 stars 0 forks source link

Umnak/Samalga OR just one big Bogoslof? #1

Open mike-levine opened 5 months ago

mike-levine commented 5 months ago

Hey @nlauffenburger and @mckelveyd I'm getting into the basic report for 2024- I've got one question so far:

One question for you - how do you plan to deal with Umnak and Samalga in the analysis? This comes up a few places, such as in length-frequency plots image

and the maturity figures. image

In 2020, the abundance-weighting was done for each region independently. Do you plan to do this again, or just combine into one region? I guess it is a bit of a moot point with two trawls, as if you treat each separately, they each get a weight of 1; if you don't the weight will be based on abundance in each. But, it is still good to know as it changes the comparison points (figure c) above). And it is good to get it right for the future reports too.

If you want to keep the Umnak/Samalga divide- it is easy to keep. Where exactly does it get made? Is there an official latitude?

Thanks!

nlauffenburger commented 5 months ago

Hi Mike, Yes, I think we would like to keep the regions separated for these purposes, current year biomass/numbers at length & maturity plots. For the historic biomass/numbers by length/age, there's no separation into the different regions.

For the acoustic data, this year, the cutoff is transects 1-8 (Umnak) and 9-19 (Samalga), but these transect numbers have been different in the past, though you probably don't need it for historic proportion at length. For the biological data, it looks like Denise used: Samalga: west of 168° 30' W Umnak: between 168° 30' W and 167° W

Let me know if you have any more questions

mike-levine commented 5 months ago

Sounds great, how about I just use:

Samalga: west of 168° 30' W Umnak: between 168° 30' W and 167° W And: any others, like for the 'CBS specific area' from deep in the past? Or for this is is good enough to just have everything either be Umnak or Samalga?

nlauffenburger commented 5 months ago

There was also the category Unakaska: between 167° W and 166° W.
Perhaps you can use anything < 167 as Unalaska. This is its own column in one of the tables: image

mike-levine commented 5 months ago

aah helpful footnote. Thanks! I'll do that.

mike-levine commented 5 months ago

Ok cool- this is what those rules do in the 1994-current stuff (I.e. whats in the database) image

Look reasonable? What do you want to call the 'other' stuff? is there a better name?

nlauffenburger commented 5 months ago

I'm not sure exactly, some of that might be best classified as CBS, but maybe @mckelveyd can help guide us on this?

nlauffenburger commented 5 months ago

I think maybe since the guidelines in that table were south of 55 N, you can put most of the 'other' in either Umnak or Unalaska. Not sure what the cutoff is that you are using for the purple. But then I'm not sure what north of 55 N should be.

mike-levine commented 5 months ago

Oh ok- here's a better go at that. For now, the north of 55 is 'Northern Bogoslof' but that's just a working name... this looking a bit more correct? image

mckelveyd commented 5 months ago

Taina and I think that there is some confusion going on with these areas...

1) Although there were northern extension tracklines completed in 2012 to check for fish in the historical Bogoslof area. There is no "northern Bogoslof" area. What you are seeing in 2000, 2001, 2002 are transects completed across part of the EBS shelf rather than what was considered Bogolsof. @tainahon can elaborate further.

2) The Central Bering Sea (CBS) area was defined as the area between 170 and 167 deg longitude; historically, the Bogoslof survey included this area as well as areas east and west of the CBS area.

3) The Umnak, Samalga, and Unalaska areas as described in the cruise report table 13 (2020-02) to describe maturity were my definitions and not necessarily analyzed for biomass using those lats/lons.

mike-levine commented 5 months ago

Thanks for that, yeah, I am clearly confused! I just tossed that "northern Bogoslof" name on there to understand where things were at, but that answer makes sense.

Really, I just want to add an index with the 'region' within the Bogoslof so the analysts can tell what to include/not include in some places.

For example, if you are comparing, say, the biomass in 2024 to the biomass in 2002, what is the fair comparison? Like to make a table such as: image

what are the regions that would be included in the totals? Similarly- if you wanted maturity weighted by local abundance, and wanted to compare this to the past, what regions would you include?

tainahon commented 5 months ago

@mike-levine @nlauffenburger @mckelveyd -- This survey has a lot of complexity over time. The answer for 2024 seems straight-forward as everything surveyed is considered 'Bogoslof' with sub-areas Umnak and Samalga. But generally speaking, the further you query into the past, the more "custom" it may get depending on the year. Would it be helpful if you, Nate, Denise and I meet to discuss the historical nuances of this survey, w/regards to the current goals for querying these data? (I find it a bit hard to discuss inside Github).

mike-levine commented 5 months ago

Good points, @tainahon and @mckelveyd- I was a bit naïve to think this time series could easily be shoehorned geographically, but it probably isn't too important to do that anyway. Thinking backwards from the end:

I just want to provide geographic indices that can be used to help create these figures: image

image

I don't know if you actually want geographic regions (i.e. would 'all of Bogoslof' would be fine?) on many of the other historic comparison figures we present in the current GOA cruise reports, such as:

image

image

image

image

If 'all of Bogoslof' is a fine scope for those ones, we're already good. We don't really need to worry about Umnak vs Samalga then.

So- I think for helping prep the cruise report, if you let me know where exactly you'd want the Umnak/Samalga divide compared to previous surveys in figures, that's enough. It looks like the 'current' footprint goes back to 2012 survey, so it would be easy to compare Umnak vs. Samalga to this point, for the purposes of, say, weighted maturity, for example.

As for the tables, these may just need to be a 'take the 2020 table, and append a 2024 row' on kind of operation, as it would require a lot of custom work on the older surveys- and the point here is to make folks lives easier!

nlauffenburger commented 5 months ago

Mike and I chatted and we are going to use the approach that for the maturity and length-frequency plots by region, we will separate by Umnak and Samalga and use a geographic region to keep these consistent when looking at historical data. For the rest of the figures and tables, everything gets grouped together into all of Bogoslof, so that should be straight forward.

tainahon commented 5 months ago

Thanks Mike and Nate. 1) That approach sounds like it could work, especially if you restrict the historical length frequencies plots to a subset of the longer time series. I can take a look at when in the past the analysis results for numbers and biomass become harder to obtain if one uses a straight up query of what's in mb2 (i.e., if those queries break down), just for our information.
2) I am reminded from your little color plots of the time series' regions that in 1994 there is trackline assignment mischief on transect 6 where 'Unalaska' transitions to 'Umnak'. It is real and is not a new problem-- trackline overlap problems were there when it was migrated from MBhistoric into mb2; Kresimir and I have not completely sorted them out. But the historical biomass and numbers in the tables should be correct for that survey. 3) I think if you use them at all, the MWD and HAB violin plots (e.g., Fig 22) should be customized for Bogoslof. HAB doesn't make sense for that region, since the bottom is routinely deeper than our recorded data range and not usable. Also, pollock do not generally recruit to Bogoslof before age 4, so only a miniscule number will be below 30 cm unless an extraordinarily large year class shows up there at age 3 or 4.

nlauffenburger commented 5 months ago

Hello again @mckelveyd & @tainahon

I'm following up on this discussion to make the best or most effective choice in populating the 1) numbers/biomass by length/age tables, 2) time series of total biomass (bar or point plot by year, Figure 10, 2020) and 3) the waterfall plots of numbers at length/age.

Querying data from the MB2 prior to 2003 doesn't match what are in the tables from the 2020 report. I looked into 1998 and found that even if I restricted the longitude to greater than -167 degrees, which was designated in Table 13 for maturity (I know you, Denise, noted above that this wasn't necessarily what was used for biomass).

So my question is whether it is easy/possible to figure out the correct log bounds, latitude/longitude, or other ways to identify the correct intervals from MB2 to generate the correct data by length & age for Bogoslof proper matching the historic tables? If these old surveys haven't been tuned in for this purpose, the other very quick option is just to pull in the data from a spreadsheet using the tables from 2020. Then the next time we do a Bogoslof report, maybe we can update the code to just query from the database.

Thanks for your help.

mckelveyd commented 5 months ago

Let me get back to you on this Nate. I am working with a new computer today after my computer crashed hard yesterday. As my work files are getting resynced, I am currently working on 1996 bogoslof num/biomass for the Tags project so hopefully as I dig into this, I can better answer your question.

Denise

On Thu, Apr 4, 2024 at 3:12 PM Nate Lauffenburger @.***> wrote:

Hello again @mckelveyd https://github.com/mckelveyd & @tainahon https://github.com/tainahon

I'm following up on this discussion to make the best or most effective choice in populating the 1) numbers/biomass by length/age tables, 2) time series of total biomass (bar or point plot by year, Figure 10, 2020) and 3) the waterfall plots of numbers at length/age.

Querying data from the MB2 prior to 2003 doesn't match what are in the tables from the 2020 report. I looked into 1998 and found that even if I restricted the longitude to greater than -167 degrees, which was designated in Table 13 for maturity (I know you, Denise, noted above that this wasn't necessarily what was used for biomass).

So my question is whether it is easy/possible to figure out the correct log bounds, latitude/longitude, or other ways to identify the correct intervals from MB2 to generate the correct data by length & age for Bogoslof proper matching the historic tables? If these old surveys haven't been tuned in for this purpose, the other very quick option is just to pull in the data from a spreadsheet using the tables from 2020. Then the next time we do a Bogoslof report, maybe we can update the code to just query from the database.

Thanks for your help.

— Reply to this email directly, view it on GitHub https://github.com/noaa-afsc-mace/winter_2024_bogoslof_report/issues/1#issuecomment-2038336126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZLYHAJ66LQNJBCC5EJBDF3Y3XF45AVCNFSM6AAAAABFLRZQQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZYGMZTMMJSGY . You are receiving this because you were mentioned.Message ID: @.*** com>

tainahon commented 5 months ago

Hi Nate, Denise, Kresimir and I pulled what we could from mb_historic into mb2 and tried our best to recreate what was done back when and make it into a sensible mb2 analysis. In some cases we couldn't recreate the original. And 199502 wasn't even in mb_historic, and thus is not in mb2 either. So I would say the tabular route is likely to be best for now, especially if you want to get something done sooner rather than later. I can also take a look at 1998 (maybe we can look at other mb_historic era Bogoslof surveys together, Denise, since I have looked at some of them once already) and see what should be in mb2 and whether it's query-able.

tainahon commented 5 months ago

Also Nate, I am curious what you are defining as 'different' in this case. Is it 'any' difference from Denise's 2020 table? For, e.g., 199802 it looks like the total numbers and biomass in mb2 are both within 1% (~0.05%) of the original published numbers and biomass at age, so given all the changes in the database, we were tending to call that good enough. Are you querying out higher diffs than that?

nlauffenburger commented 5 months ago

@tainahon. I think I'm getting differences of 3.9% (511 thousand tons versus published 492 thousand tons) for the all data and a difference of 9% when restricting longitude to be < -167 W (446 compared to 492). It sounds like I'm doing something wrong in the query so we should touch base about how you are getting within 1%. Thanks!

tainahon commented 5 months ago

Sure, let's compare notes @nlauffenburger. Here's my query:

SELECT sum(numbers), sum(biomass) from macebase2.analysis_results_by_length WHERE ship = 21 AND survey = 199802 AND data_set_id = 1 AND analysis_id = 1 AND zone = 0 AND report_number = 1 AND transect is not null

producing the following numbers and biomass (kg):

434381982.2641910440279 492138982.2301981515076

tainahon commented 5 months ago

For MF199702, starting from what is in mb2 we need to subtract out the fish from Transect 1 to get close to the original Bogoslof estimates. Transect 1, haul 1, pollock were categorized as EBS shelf pollock rather than Bogoslof (basin) pollock. Numbers and biomass were estimated for transect 1 along with transects 2+, but not included. This should be do-able in mb2.

nlauffenburger commented 4 months ago

Awesome. Thanks for the specifics here, @tainahon. This is really helpful. I looked in the report definitions table and only saw 1 report number for 199802, so I (incorrectly) assumed that all the data were labeled with that report number. It looks like there are a bunch of data in analysis_results_by_length without a report number or transect number and those need to be removed.

I'll assume this same behavior for the other older surveys and I'll remove transect 1 from 199702. I may follow-up if I get caught up again, but this is a great start. Thanks!