FIX: add VMDAS search step...

jklymak commented 2 years ago

This adds the ability for the RDI reader to search for the VMDAS navigation header manually from the end of the ensemble. For whatever reason RDI's byte offsets are incorrect near the end of the ensemble. However, the Navigation block is still there, as expected.

Note I added arbitrary pass-through kwargs for io.api.read so that the instrument readers can have their own kwargs.

This probably needs tests and docs.

lkilcher commented 2 years ago

Thanks @jklymak ! Do you have a data file that requires this fix that you can share? We typically only use the first 100 pings (a few mb or so) of a file in our tests. We use git-lfs to store/track these files.

jklymak commented 2 years ago

Yeah sorry I should have put this in draft

In addition to the navigation block being misplaced for some reason, the problem is for vmdas files there are sometimes two sets of interleaved pings (nb and bb). These can be completely different bins sizes, blanking differences etc. That means there is an API issue as to how to return the two somewhat disjoint datasets. Worse, if you ping with just NB or BB but don't set the preferred output type, you get both output types where one of them is empty. It's all pretty mysterious. Anyway. I'll try and clean up now that I am back on shore and try to get a reasonable version with test files to you soon.

lkilcher commented 2 years ago

Yea, we have a similar problem with Nortek Signature data files where they have a bunch of different interleaved data types. It’s been a headache trying to decide how to organize the data. Originally the idea behind DOLfYN was that it would provide a universal ADCP data-type, but the reality has long ago disintegrated that dream.

The primary guiding principal has been ‘provide all the data in a single structure’. Maybe later we will figure out some ‘standardize’ functions—that a user could choose to use—to translate the ‘everything included data types’ to a common/simplified format, but that might also be a pipe dream.

What all of that means for you: assuming bb and nb have there own timestamps and other coords, I recommend including those as separate ‘coords’ in the xarray dataset (e.g. ‘time_nb’ and ‘time_bb’). You could load the dolfyn/tests/data/VelEchoBT01.nc file for an example of how we’re managing this issue for Signature data.

Hopefully that helps.

jklymak commented 2 years ago

I think we could have one xarray object with two sets of coordinates. OTOH I would suggest that it is just a a easy, if not easier, to make two separate xarray objects. There are two issues with this approach, however. Shared info (say bottom track or gps) would need to be still shared. Also we would be changing the api to return a list of dataSets rather than just one. But given that you also have this issue with norteks I wonder if that is not an ok approach. You could maybe optionally squeeze the list to a singleton if there is only one element.

lkilcher commented 2 years ago

I'm open to the multiple objects is better than 1 idea, but it would require some refactoring on the Nortek side... but maybe not so much because we do read the interleaved data into separate structures, then combine them near the end (prior to returning the data).

Ultimately this is really an API design philosophy question. For the Nortek data bottom-track pings are one of the 'interleaved data types', and generally I think the user wants this data alongside the conventional ping data.

Obviously there's also the possibility of adding a split_data_types (e.g.) option to the read function, but I'd prefer to avoid that if possible due to the added complexity that comes with doing so.

I'm Interested to know what @jmcvey3 thinks.

jmcvey3 commented 2 years ago

To be clear, in dolfyn we're currently differentiating between Nortek datatypes (water track, 5th beam, bottom track, echosounder, AST & altimeter) with "tags" attached to each set ( '', '_b5', '_bt', '_echo', '_ast' & '_alt'), respectively.

Nortek has the option for their "AD2CP" instruments to run 2 different profiling configurations (an average config + a burst config), and, correct me if I'm wrong @lkilcher, I don't believe we've tested dolfyn with a dual-configuration datafile. It sounds to me (@jklymak) and also correct me if I'm wrong, that this VMDAS nb vs bb is similar to the dual configuration setup.

I know TRDI also integrated the dual profiling configurations into the Sentinel V series as well, though I've never played with one of those instruments.

jklymak commented 2 years ago

That sounds the same. Basically the rdi had a 8000 tag for one ping type velocity and an 8001 tab for the other ping type. However I am not sure they have different time stamps, despite pinging at different times. I think they are considered the same ensemble. I guess you could also add a 4th dimension to the variables.

jmcvey3 commented 1 year ago

Now that I'm finally getting back to this, I've started to look into the PDDecoder that TRDI put up on github because of a dataset that has dual-config set up. I think it would be easiest to return 2 datasets, one with each configuration, and just keep shared data in the first. Let the user then copy shared data over to the second if they'd like.

jmcvey3 commented 1 year ago

I've been working off for a fix to read data from a Riverpro, and attempting to incorporate the info from TRDI's PDDecoder SDK into dolfyn. I ended up using and merging the changes in this pull request into this branch: https://github.com/jmcvey3/dolfyn/tree/trdi_5beam

jklymak commented 1 year ago

yes, totally OK to close this, or subsume it in a different PR. I haven't come back to clean it up ;-)

jmcvey3 commented 1 year ago

Thanks! I'll link and close this to a new PR when its ready

lkilcher / dolfyn

FIX: add VMDAS search step... #92