ices-eg / wg_WGFAST

Working Group on Fisheries Acoustics, Science and Technology
11 stars 16 forks source link

Expand to store echosounder data, including split-beam #3

Closed gavinmacaulay closed 4 years ago

gavinmacaulay commented 6 years ago

Expand SONAR-netCDF for the storage of conventional echosounder data:

erinann commented 5 years ago

@gavinmacaulay ,

This sounds like a great idea.

leewujung commented 5 years ago

@gavinmacaulay It's awesome that we can discuss here! 😀

Here is a link to echopype, a package I've been developing with a few others to unpack echosounder/ADCP data based on the SONAR-netCDF4 format. I have the EK60 .raw conversion working (installable via pip!), and am working with a student now to convert the AZFP .01A file based on what @marianpena and @bnwkeys did in Oceanhackweek last year. @SvenGastauer helped put in the reading part for EK80 in a fork, but we still have to work on the format to see how to accommodate some different attributes for broadband data.

For metatdata in EK60 and AZFP, I've taken the approach to simply added netCDF variables in appropriate groups when there is no corresponding field defined already in the convention.

I think a more substantial discussion is needed for multi-frequency data. Here are my thoughts in the echopype document. It basically revolves around trying to use a multi-dimensional array to store multi-frequency data, and consider filling in NaNs when the number of range bins are different. As of now I store multi-frequency EK60 backscatter data all under a single Beam group with 3 coordinate variables: frequency, range_bin, ping_time. There are corresponding changes under other groups too, such as the absorption coefficient under the Environment group.

I'd appreciate more discussion on this as we progress on this development!!

gavinmacaulay commented 5 years ago

A challenge with having all channels of a broadband dataset the same length and filling with NaN's is the extra storage required. This wasn't a big problem with multi-frequency EK60 data, but for EK80 broadband data it becomes significant. For example, recording data to 1000 m on a 200 kHz FM channel when the useful range is about 300 m generates 3.6 MiB of extra data per ping (151 GiB per day at 0.5 Hz ping rate). The problem is worse with 333 kHz. This is what motivated the data storage method in the current SONAR-netCDF4 convention.

To complicate the picture, the EK80 can now store different channels to different ranges.

I wonder if there is a convenient way/procedure to have the data stored in an space efficient manner, but transparently converted into a more convenient structure when loading into memory by the processing software(s)?

I note that the absorption in the /Environment group can already cope with storing values for different frequencies (via a frequency coordinate variable, not a channel id, so is a little less convenient than maybe desired).

leewujung commented 5 years ago

Great! I did encode the absorption variable along a frequency coordinate variable. 😀 I think it is more convenient because the access is then more explicit and error-proof than using channel ID.

I will do the NaN padding for EK60 and AZFP data for now, and think some more about how to deal with this problem for EK80. I am thinking this may be the place where the difference between raw vs processed data (e.g., Sv or MVBS) comes into play, since for operations like frequency-differencing, the computations are on identical grids of processed data. The multi-dimensional filled array approach is likely the most useful there, and the raw data can be stored in a space efficient manner as you said.

gavinmacaulay commented 4 years ago

Partial implementation for storing split-beam data was merged as per PR #29.

A method to store gridded data across many channels (frequencies) will follow.

gavinmacaulay commented 4 years ago

Kamino cloned this issue to ices-publications/SONAR-netCDF4

ghost commented 4 years ago

This issue was moved by gavinmacaulay to ices-publications/SONAR-netCDF4#11.