smerckel / dbdreader

A reader for binary data files created by Slocum ocean gliders (AUVs)
GNU General Public License v3.0
17 stars 14 forks source link

Fix dont crash missing #16

Closed jklymak closed 1 year ago

jklymak commented 1 year ago

Currently if a file doesn't have a sensor, it gets dropped entirely. This change just returns an empty array rather than dropping the whole file.

jklymak commented 1 year ago

BTW, this is a change of behaviour, obviously - we could try and thread a kwarg through to control it if that is preferred. But I don't see the disadvantage of this versus erroring.

For context, this occurred because we had a data set where the sensor list was changed part way through and we stopped getting m_water_vx.

smerckel commented 1 year ago

Not sure how to accommodate this best. It was by design that DBD's get-method returns an error if one or more parameters are requested that are not available in the data file. If such a situation occurs, it seems natural to point out to the user that the request was made in error, rather than returning an empty array. On the other hand, I can see the point of this request. The proposed implementation would also result in the expected behaviour for MultiDBD's get method: if one parameter is missing in one or more files, but present in at least one other, then current behaviour would be that all parameters were dropped for those files with at least one missing parameter. The new implementation would return these parameters. The downside is that if a parameter is not found, because it is misspelled for example, then it may be not clear to the user. To ensure that some form of feedback is given, feedback should be given through logger.warning, which can be suppressed if needed, but provides normally feedback. I think I prefer not to use a keyword to the get method, to control the behaviour of missing parameters. Is that acceptable?

jklymak commented 1 year ago

@smerckel Thanks for looking at it.

I agree this is a bit tricky to make robust for everyone. I think there are three cases (beyond every file having every requested variable):

  1. The user has a typo and meant m_water_vx but typed m_water_ux or something and probably wants to know that.
  2. The variable is correct, but only in some of the files and not others
  3. The variable doesn't exist at all for the found files, but the user wants to not care about that and get a NaN filled array anyway.

We want 2 and 3, dbdreader currently does 1.

I think 1 and 2 are easily tested by looking at self.parameterNames. By default we could error out if a variable is requested that is not in parameterNames, but not error out if an individual file doesn't contain one of them. That would put the checking a level above ._get, and ._get would be modified as proposed here.

If you don't want to support 3, I think we could easily handle that ourselves outside of dbdreader. If you wanted to support it in dbdreader, then I think strict kwarg that defaults to True would be a reasonable way to proceed, but it would not need to be threaded through - it would just be at the top-level get_* methods.

smerckel commented 1 year ago

@jklymak

What about this:

1) an error would be raised if a parameter is asked for that does not exist at all. This could be checked against the information in the cache file, which is processed anyway. This would trigger an error when a typo is made.

2) if a parameter is requested that is a known glider parameter, but does not appear in some or all files, it will be treated as missing data. All Nans are returned if return_nans is set to True, and otherwise no data are returned if the parmeter is not present, or otherwise only those data that are present (in case of MultiDBD).

I am not sure if this covers your case 3. I suspect, it does not, because of 1) above. In any case the check for raising an error or not in 1), can be done before any data are being read, so it should not be so expensive. You could handle the case then outside dbdreader.

If this is OK for you I can have this implemented on next Monday.

jklymak commented 1 year ago

I think that works. I also think we have the variable in the cache file so that works perfectly for our use case.

I'm happy to modify the PR but also happy for you to take over

Thanks

smerckel commented 1 year ago

I took you PR already, and modified it a bit (basically making sure other parts still work too). Making the final changes should not be too problematic. I am happy to do that. Will be Monday though.

jklymak commented 1 year ago

No rush at all. Thanks!

smerckel commented 1 year ago

Modifications are implemented, which essentially does this:

I created a new branch "handle-missing-parameters" which incorporates most of the changes suggested in this PR and adds some modifications.

Please give this a try to see if it does what you envisage. If all is ok, then I can merge this into the master branch.

smerckel commented 1 year ago

Merged handle-missing-parameters in master branch ( with version 0.4.13).