Typical workflow? - Githubissues

jessecusack commented 2 years ago

I'm trying to understand the current workflow with this package to see where bin mapping might fit.

My understanding from the notebook is it goes something like this:

Initialise a ProcessADCP object.
Check the editing parameters with convenient plotting functions
Do the editing
Save to netcdf via xarray (?)

The part that confuses me right now is step 3. It seems like this happens when average_ensembles or burst_average_ensembles is run, via calls to _edit etc.

To fit into the current workflow, I could create a _binmap method and then call it in one of the averaging methods. Since I'm working with single ping data, not burst/ensemble (concepts I also don't fully understand), a lot of the averaging steps seem unnecessary. Is there some way of splitting the editing out into smaller chunks that could be mixed/matched as needed?

gunnarvoet commented 2 years ago

Yes, that describes the workflow pretty well. I wouldn't claim that this is a very intuitive or generally efficient way to step through the processing, it just matched what I was doing with the dataset. Happy to hear suggestions for how to change this. In general, it may make sense to differentiate a bit more between ping / ensemble / averaged & depth gridded dataset and not have everything inside one bloated class.

The burst sampling is really also just working with single pings but on a non-regular time grid, i.e. ping a few times and then wait for a while. I have not come to a conclusion whether generally this makes more or less sense than pinging at a regular interval but seem to gravitate towards it.

jessecusack commented 2 years ago

A few thoughts.

A (hopefully) simple thing to do would be to create a method called edit_ensembles that reads the raw data and performs _edit, _to_enu etc. without averaging anything. For consistency, it would still have to create an ave Bunch variable, which should then work with _ave2nc.

In the long run it might be worth thinking about a hierarchy of processing for ADCP data (e.g. https://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products).

Roughly following the NASA idea, L0 data would be the raw along-beam data with no edits or rotations. L1 could involve basic processing and cleaning e.g. remove times out of water, remove low correlation data, remove bad beam-velocity data, bin mapping, remove data above surface. L2 could be data rotated into earth coordinates and additional cleaning based on the error velocity. L3 might involve time averaging and depth gridding, calculation of 'percentage good', etc.. I don't think L4 would makes much sense for an individual instrument, but might be some derived quantity like volume transport for a whole mooring array.

Right now the package jumps right from L0 to L3, which is definitely convenient because for the most part we're not interested in the intermediate output. However, I think it would make the package more versatile to explicity go through some of those steps. I guess we'd also have to think about what makes sense for efficiency, speed and storage space.

gunnarvoet commented 2 years ago

Yes, in general I agree to this. There are some subtleties, for example when working with pings from a burst sampling scheme it is advantageous to do some data editing and the percent good calculation before depth gridding while I think this is true the other way round for ensembles formed from pings that are further apart in time. I am sure there are ways to deal with this and agree that defining processing levels should be really helpful, both to write cleaner code and to make it easier for the user to understand what is happening.

jessecusack commented 2 years ago

Hmm. Maybe after splitting off a few of the processing steps into separate methods it will become clear how to handle the various options. I'll probably start by tackling bin mapping first.

modscripps / velosearaptor

Typical workflow? #15