Closed jklymak closed 1 year ago
BTW, this is a change of behaviour, obviously - we could try and thread a kwarg through to control it if that is preferred. But I don't see the disadvantage of this versus erroring.
For context, this occurred because we had a data set where the sensor list was changed part way through and we stopped getting m_water_vx
.
Not sure how to accommodate this best. It was by design that DBD's get-method returns an error if one or more parameters are requested that are not available in the data file. If such a situation occurs, it seems natural to point out to the user that the request was made in error, rather than returning an empty array. On the other hand, I can see the point of this request. The proposed implementation would also result in the expected behaviour for MultiDBD's get method: if one parameter is missing in one or more files, but present in at least one other, then current behaviour would be that all parameters were dropped for those files with at least one missing parameter. The new implementation would return these parameters. The downside is that if a parameter is not found, because it is misspelled for example, then it may be not clear to the user. To ensure that some form of feedback is given, feedback should be given through logger.warning, which can be suppressed if needed, but provides normally feedback. I think I prefer not to use a keyword to the get method, to control the behaviour of missing parameters. Is that acceptable?
@smerckel Thanks for looking at it.
I agree this is a bit tricky to make robust for everyone. I think there are three cases (beyond every file having every requested variable):
m_water_vx
but typed m_water_ux
or something and probably wants to know that. We want 2 and 3, dbdreader currently does 1.
I think 1 and 2 are easily tested by looking at self.parameterNames
. By default we could error out if a variable is requested that is not in parameterNames
, but not error out if an individual file doesn't contain one of them. That would put the checking a level above ._get
, and ._get
would be modified as proposed here.
If you don't want to support 3, I think we could easily handle that ourselves outside of dbdreader
. If you wanted to support it in dbdreader
, then I think strict
kwarg that defaults to True would be a reasonable way to proceed, but it would not need to be threaded through - it would just be at the top-level get_*
methods.
@jklymak
What about this:
1) an error would be raised if a parameter is asked for that does not exist at all. This could be checked against the information in the cache file, which is processed anyway. This would trigger an error when a typo is made.
2) if a parameter is requested that is a known glider parameter, but does not appear in some or all files, it will be treated as missing data. All Nans are returned if return_nans is set to True, and otherwise no data are returned if the parmeter is not present, or otherwise only those data that are present (in case of MultiDBD).
I am not sure if this covers your case 3. I suspect, it does not, because of 1) above. In any case the check for raising an error or not in 1), can be done before any data are being read, so it should not be so expensive. You could handle the case then outside dbdreader.
If this is OK for you I can have this implemented on next Monday.
I think that works. I also think we have the variable in the cache file so that works perfectly for our use case.
I'm happy to modify the PR but also happy for you to take over
Thanks
I took you PR already, and modified it a bit (basically making sure other parts still work too). Making the final changes should not be too problematic. I am happy to do that. Will be Monday though.
No rush at all. Thanks!
Modifications are implemented, which essentially does this:
I created a new branch "handle-missing-parameters" which incorporates most of the changes suggested in this PR and adds some modifications.
Please give this a try to see if it does what you envisage. If all is ok, then I can merge this into the master branch.
Merged handle-missing-parameters in master branch ( with version 0.4.13).
Currently if a file doesn't have a sensor, it gets dropped entirely. This change just returns an empty array rather than dropping the whole file.