NEUS biomass not fully corrected for effort?

rBatt commented 9 years ago

This is a peculiar data set b/c it is already processed by someone else by the time I touch it. It seemed very clean and organized.

Here is the script that performs a variety of "conversion" to the biomass/ abundance data before writing to the RData file that I read in: https://github.com/rBatt/trawl/blob/master/Data/raw_data/NEFSC/2014-03-23/SeanLucycode/Survdat_calibrate.r#L121-L181

It seems like the data are given species-specific corrections for changes in gear. However, i don't know much about how the duration/speed of the hauls varies over time, or how the distance/area trawled varies.

The 9th item on page 7 of SVDBSvariabledefinitions.pdf seems to indicate that the survey station data should contain an item "TOWDISTANCE" to indicate the distance that gear traveled (and I'm assuming that the correction script accounts for the size of the net, such that also dividing by the tow distance would result in BIOMASS having "area trawled" in its denominator, effectively making it a "biomass per unit effort" where effort is area sampled").

However, the TOWDISTANCE column is not present in any of the data files.

@mpinsky Ideas here? Relates to your second point in the first comment you have here: https://github.com/mpinsky/OceanAdapt/issues/17, or to this issue in general: https://github.com/mpinsky/OceanAdapt/issues/19

rBatt commented 9 years ago

@JWMorley do you have any idea if that distance towed is likely to vary a lot, or is an important part of "effort"? My guess says that it's important, but I might be wrong if it's a really small source of variation.

JWMorley commented 9 years ago

The SEAMAP survey calculates effort based on the distance towed (so GPS before and after) and a fixed value for net width. Some other surveys (e.g. NEAMAP out of VIMS, see link below) have fancy sensors on the trawl that calculate area (or volume?) swept for each tow. http://www.vims.edu/research/departments/fisheries/programs/multispecies_fisheries_research/field_methods/neamap/index.php

I did some quick plots with the SEAMAP data and they are attached. You can see that most effort falls within a 0.5 hectare window. All these tows were entered at 20 minutes duration. Also, here is a figure showing how total catch weight varies with effort. Doesn't seem to be a relationship here. I also attached a density plot as its hard to see if any relation is occuring with the points-plot. Surprisingly, not much relationship b/w catch and effort. Hope this helps, Jim

@JWMorley do you have any idea if that distance towed is likely to vary a lot, or is an important part of "effort"? My guess says that it's important, but I might be wrong if it's a really small source of variation.

Reply to this email directly or view it on GitHub: https://github.com/rBatt/trawl/issues/53#issuecomment-99624563

rBatt commented 9 years ago

@JWMorley I don't think your attachments made it to github – click the "View it on github" link at the bottom of the email.

JWMorley commented 9 years ago

dist of trawl effort effort vs catch contour plot effort vs catch

mpinsky commented 9 years ago

I just sent an email to Sean Lucy (sean.lucey@noaa.gov) to ask this question. It is not clear to me from the documentation.

mpinsky commented 9 years ago

The email from Sean (sean.lucey@noaa.gov):

Malin,

The biomass or expcatchwt is the actual catch for the station. The units are kg. They have not been corrected for swept area. I'm currently working on an R package for internal use that calculates stratified means and swept area biomass estimates. I'd be happy to pass that along when it's finished.

Sean

Sean M. Lucey Fisheries Biologist U. S. Dept. of Commerce/NOAA/NMFS Northeast Fisheries Science Center 166 Water Street, Woods Hole, MA 02543 508-495-2011 (voice) 508-495-2232 (fax)

mpinsky commented 9 years ago

In our next update to the trawl data, we should ask for towed area (or tow length & net width, or tow duration & tow speed & net width, whichever are available), so that we can do this correction ourselves.

rBatt commented 8 years ago

I think we've determined that the data we get from Sean Lucey are pretty clean and standard. As I understand it, we don't have an actual effort, but it's been asserted that things have been kept standard. In more recent years they've been using a metric for effort (of sorts), but comparing that metric to similar metrics for older time periods will likely introduce more uncertainty than simply assuming the sampling is the same.

Gmail Ryan Batt battrd@gmail.com Re: Verifying the units on EXPCATCHWT Malin Pinsky malin.pinsky@rutgers.edu Wed, May 13, 2015 at 9:45 AM To: Sean Lucey - NOAA Federal sean.lucey@noaa.gov, Ryan Batt battrd@gmail.com Sean, many thanks. In our next data update in the next month or so, we'll ask for the towed area, or tow length, or boat speed & tow duration (whichever are available) so that we can do those corrections ourselves. Unless there is a good reason not to correct for towed area with these data?

Regards, Malin

On Wed, May 13, 2015 at 8:46 AM, Sean Lucey - NOAA Federal sean.lucey@noaa.gov wrote: Malin,

The biomass or expcatchwt is the actual catch for the station. The units are kg. They have not been corrected for swept area. I'm currently working on an R package for internal use that calculates stratified means and swept area biomass estimates. I'd be happy to pass that along when it's finished.

Sean

Sean M. Lucey Fisheries Biologist U. S. Dept. of Commerce/NOAA/NMFS Northeast Fisheries Science Center 166 Water Street, Woods Hole, MA 02543 508-495-2011 (voice) 508-495-2232 (fax)

On Wed, May 13, 2015 at 12:42 AM, Malin Pinsky malin.pinsky@rutgers.edu wrote: Hi Sean,

I just wanted to verify the units of the EXPCATCHWT column in the trawl database (becomes the BIOMASS column in the dataset you sent us a year ago, see email below).

Is it in kg/ha, kg/m2, or something else? Has it been corrected for area trawled?

Many thanks, Malin

On Thu, Mar 20, 2014 at 12:37 PM, Sean Lucey - NOAA Federal sean.lucey@noaa.gov wrote: Jon,

The more I think about this, is your proposed format the best option? Your format would generate a lot of zeros. Right now, I have 1 data set that has station, catch, and length data merged together. Here is a simple R scripts to generate just station data or just catch data.

Sean

Sean M. Lucey Fisheries Biologist U. S. Dept. of Commerce/NOAA/NMFS Northeast Fisheries Science Center 166 Water Street, Woods Hole, MA 02543 508-495-2011 (voice) 508-495-2232 (fax)

On Thu, Mar 20, 2014 at 10:16 AM, Sean Lucey - NOAA Federal sean.lucey@noaa.gov wrote: Jon,

The standard is to use stations with an shg < 136. The code I provided earlier includes that distinction. Although not in the exact format you suggest, I have all the data with the conversions in an rdata format if that would be helpful. I'm in a meeting today but could try and reformat it during the breaks.

Sean

On Thursday, March 20, 2014, Jon Hare - NOAA Federal jon.hare@noaa.gov wrote: I think it would be better to do the join between station and catch before posting the data. To make sure the zeros get in there. The file would be a little larger but the value of doing the join correctly would be worth it. So it would be a rectangular matrix tows/stations as rows and tow info/species as columns.

There are also some winnowing factors in pulling the data: purpose code, tow quality, etc.

Sean do you have standard "good" data criteria?

Cheers

Jon On Thursday, March 20, 2014, James Manning - NOAA Federal james.manning@noaa.gov wrote: Hi Massimo-.

So Malin does not have to wait for us to write Python code, a temporary solution would to dump of the entire dataset in the form of comma-delimited ascii files, one for each table. I could do that today if Malin would like.

-JiM.

On Wed, Mar 19, 2014 at 11:55 AM, Massimo Di Stefano distem@rpi.edu wrote: me neither, i don’t know the db structure. about the data storage, storing the data as netcdf or better hdf5 will let us to have the data in a single container and perform query based on :

space
time
data fields selection

without keeping in memory the full dataset.

i usually use netcdf4-python for netcdf [1] and h5py [2] / pytables [3] for hdd

[1] https://github.com/Unidata/netcdf4-python

[2] http://www.h5py.org/

[3] http://www.pytables.org/moin

On Mar 19, 2014, at 12:28 PM, Malin Pinsky malin.pinsky@rutgers.edu wrote:

Hi all,

It sounds like a solution is appearing! Thank you all for the hard work on your end. It sounds like it has taken quite some figuring out, but this will be a great resource down the road for many of us.

I don't know your exact database structure, but the fields you serve up on IOOS are a good start. Given the size of the data, it probably makes sense to break the data into separate tables that can be linked together (tow data, species catch data, and if possible, length data as well). Tow data would include lat, lon, depth, year, month, day, time, stratum, cruise, tow, station, surface and bottom temperature. Catch data would include tow ID, species, biomass, abundance. Length would include length frequency data by species and tow ID.

This only needs to be for normally functioning tows. If the data have not been corrected for swept area, then I'll need that information, too. If there's a way to apply pre-apply correction factors for changes in gear or boat, that would also be wonderful, but I can do that after-the-fact if needed.

Netcdf or hdf5 is fine.

Many thanks, Malin

On Wed, Mar 19, 2014 at 11:57 AM, Massimo Di Stefano distem@rpi.edu wrote:

On Mar 19, 2014, at 11:37 AM, Jon Hare - NOAA Federal jon.hare@noaa.gov wrote:

Awesome.

In one sense you are serving the data to the public through the IOOS server. But you are correct caution is needed.

Malin what do you think you need for your work?

We could write an ipython script to write out the data that Malin needs. This script could then be run every six months or so?

Possible?

Correct.

The script to generate the data can be run manually or as a “cron task” (you set the frequency).

The output of the script can be easily saved as csv but i guess netcdf or hdf5 .. is more useful for our purpose.

Storing the results on a web accessible directory (we can easy protect the page with a password). will let Malin and other to have access to the data. We can all use the same link as input dataset in the processing so to be able to share notebooks to preform the processing without taking care of “path” and “file name” issues.

Massimo.

@Jim, i’m very glad that the python-oracle interface is working @Jon, we need to do the same on the IPython server.

On Wednesday, March 19, 2014, James Manning - NOAA Fe

Jon Hare Narragansett Laboratory Director Oceanography Branch Chief NOAA Fisheries Service 28 Tarzwell Drive Narragansett, RI 02882 cell (401) 871-4705 http://www.nefsc.noaa.gov/epd/ocean/MainPage/

Sean M. Lucey Fisheries Biologist U. S. Dept. of Commerce/NOAA/NMFS Northeast Fisheries Science Center 166 Water Street, Woods Hole, MA 02543 508-495-2011 (voice) 508-495-2232 (fax)

Malin Pinsky Assistant Professor Department of Ecology, Evolution, and Natural Resources and Institute of Earth, Ocean, and Atmospheric Sciences 14 College Farm Rd. Rutgers University (848) 932-8242 malin.pinsky@rutgers.edu http://pinsky.marine.rutgers.edu

rBatt / trawl

NEUS biomass not fully corrected for effort? #53