ices-publications / SONAR-netCDF4

The SONAR-netCDF4 convention for sonar data
10 stars 11 forks source link

It is not possible to query the ping sample count, or range limits, without reading the entire ping #28

Closed geoffmatt closed 2 years ago

geoffmatt commented 4 years ago

Hi, it seems that one of the main benefits of using NetCDF has been lost here. Echoview works very hard to only load sample data when absolutely necessary, as this is by far the most expensive part of reading a file. Therefore we typically read/calculate the necessary attributes for each ping, and only read the actual samples when a calculation or visualization actually needs them. However this doesn't seem to be possible with the current format. For example I cannot determine the sample count, nor the minimum and maximum range for a ping unless I read all the samples for that ping. This seems to be because the sample count is allowed to change between beams and pings, which means it cannot be a constant dimension. This is very problematic for us, as it makes it impossible for us to utilise the efficiencies of the NetCDF structure. We have to read every sample into memory, just to determine the range limits of the data. This changes the load time for these files by an order of magnitude. I would much rather one of the following solutions were implemented:

  1. Change the sample count to be a constant dimension. If the sample count actually changes between beams/pings, then pad it with NaN values.
  2. Write the sample counts to the file as a variable which can be queried directly.
gavinmacaulay commented 4 years ago

LSSS has implemented reading of the sonar netCDF4 format without having to read in all of the data - I'll ask them how they got the data size without reading it all in. Aside from that, your option 2 would be an ok solution in my view.

geoffmatt commented 4 years ago

Thanks Gavin, that'd be great

gavinmacaulay commented 3 years ago

To follow up: LSSS uses a Java library to access the netcdf files and it appears that this library takes care of providing the sample count. However, if this isn't readily available in other library bindings, I suggest we add my option 2 above to the convention.

cyrilleponcelet commented 3 years ago

Hi I'm not very fan of having a second variable (prone to inconsistency). Furthermore I'm very surprised : the netcdf java library is mainly a jna binding to netcdf-c api, there should not have any function allowing to do more things. Do you know which library LSSS use ? Maybe there is a special use of library, vlen type is a size_t and a pointer, does somebody tried to read them as integer in order to get only size ?

geoffmatt commented 3 years ago

I've sent a support request to the guys who produce the NetCDF libraries. It would seem to be very surprising if there isn't a way to query the length of a vlen variable without reading it. The file structure clearly has the information, I just can't find a way to ask for it. Although it's worth adding that when I try to email their support, or their support lists, the email bounces back as underliverable, which is not a great sign.

cyrilleponcelet commented 3 years ago

I didn't tested it but it seems that the only way is to use the H5Dvlen_get_buf_size function (https://confluence.hdfgroup.org/display/HDF5/H5D_VLEN_GET_BUF_SIZE). But it implies to use the HDF5 api which is less easy to use. I'm always lost between memory space, and dataspace with this API

Have you made a try to use it ?

geoffmatt commented 3 years ago

No I haven't tried the HDF5 api yet. I've started a conversation on the NetCDF GitHub here: https://github.com/Unidata/netcdf-c/issues/1893

But at the moment it looks like NetCDF itself doesn't provide an interface to this information. A plugin called NCO might have a solution

geoffmatt commented 2 years ago

We've asked Furuno and Simrad to add a new variable to the Sonar Group called "sample_count". This is an integer value which describes the sample counts for each ping. This isn't an essential variable, the file will load without it, but it improves Echoview's ability to scan the file quickly by an order of magnitude.

cyrilleponcelet commented 2 years ago

In my opinion, as long as it is not mandatory and explicitely commented as a variable allowing to improve performance, since it is already added by some constructor it should be added to the variable list in sonar-netcdf format to ensure that anyone will use the same name and meaning.

geoffmatt commented 2 years ago

I agree, that would be the most sensible action. Simrad also agree with @cyrilleponcelet, they would like this in the standard so that they are not deviating from it

gavinmacaulay commented 2 years ago

ok, I'll create a PR to implement this.

gavinmacaulay commented 2 years ago

PR #61 which resolves this is accepted.