Open mvdh7 opened 4 years ago
I might be able to help out here.
I've done something similar for SeaFlux, but it only raises error's at the moment. Could rather raise a warning at the end with a quality flag column?
I also noticed that the original version didn't pick up when there are negative pCO2 values (might seem ridiculous, but it was machine learning output gone wrong). Would be easy to add these simple checks.
Thanks! I was imagining something that would not prevent the calculations from taking place, but would just return a logical array or flag like you suggest for each equilibrium constant indicating whether the conditions used fall within its valid range, with a warning at the end.
I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.
I just eliminated some negative pCO2s today that arise from having the pH - alkalinity input pair with pH so high that the non-carbonate alkalinity that you calculate from it is greater than input alkalinity, which leads to negative bicarb + 2*carb and hence negative DIC and pCO2. These input values are physically impossible so I have NaN'd the subsequent results out if they are given. If you find other cases that give negative pCO2s then please do let me know. I'd rather figure out why each one happens individually and cut it off in the appropriate place rather than just blindly remove them.
Aside - did you see the commits I tagged you in at the end of last week?
Aside - did you see the commits I tagged you in at the end of last week?
Nope, not sure I've got the notifications turned on. Went looking, couldn't find them. In what repo?
I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.
In a separate module (still a part of PyCO2SYS) makes most sense I would say. Wouldn't be difficult to achieve. Might be able to add a flag
and comment
column, where comment
gives information about the processing.
Aside - did you see the commits I tagged you in at the end of last week?
Nope, not sure I've got the notifications turned on. Went looking, couldn't find them. In what repo?
https://github.com/mvdh7/PyCO2SYS/commit/532fb0e1124afd884e75d21ecd82f2e991318b7e https://github.com/mvdh7/PyCO2SYS/commit/b6be9ba3d3793d26d56ffb0c14c32844d3cda364
I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.
In a separate module (still a part of PyCO2SYS) makes most sense I would say. Wouldn't be difficult to achieve. Might be able to add a
flag
andcomment
column, wherecomment
gives information about the processing.
Sounds sensible - very happy for you to do this!
Thanks! I was imagining something that would not prevent the calculations from taking place, but would just return a logical array or flag like you suggest for each equilibrium constant indicating whether the conditions used fall within its valid range, with a warning at the end.
What about using numpy.ma.masked_array
for the final output. A lot neater than using bool masks. Users would still be able to get the output with arr.data
, but the default would be masked.
What about using
numpy.ma.masked_array
. A lot neater than using bool masks.
Nice idea, not very familiar with it myself, but that's no reason not to use it. Could it track which constants were (in)valid, or does it just return a single 1D mask for each output variable?
Hey, I've had a bit of time recently to think about this again. Using decorators, or a similar approach might be quite nice. Not sure if you've used these, but there would be two options with a decorator:
My suggestion
func = kH2CO3_SWS_HM_DM87 # dissociation constant funciton
# create a unit_checker function that will check the data but not run the function
# names of kwargs have to match the function inputs (much more informative for metadata output)
output = unit_checker(func, TempK=(-2, 50), Sal=[5, 50])(temperature_data, salinity_data)
# returns bool mask array and a string array that tells the user which input is outside the limit
mask_for_TempK_and_Sal, string_info_about_temp_or_sal_limits = output
# mask is True where outside limits - can easily be changed
>>> output
(array([False, True, True, True], dtype=object),
array([
'',
'kH2CO3_SWS_HM_DM87: Sal > 2; ',
'kH2CO3_SWS_HM_DM87: Sal < -2; ',
'kH2CO3_SWS_HM_DM87: Sal > 2; '
], dtype=object))
This could be taken further, that you don't have to supply TempK and Sal explicitly, as this can be stored in a dictionary associated with the name of the function.
I've written most of the code to do this, so let me know what you think about this way of doing it. I will implement this in SeaFlux in the next while so you can what I mean.
So, after some tinkering I've implemented the method for SeaFlux. It's a bit buggy at the moment, but I think it's a really useful tool for most scientific software.
Here's a link to a notebook showing how I've implemented it. I think a bit more testing will have to be done before implementing this in PyCO2SYS. https://github.com/luke-gregor/SeaFlux/blob/develop/notebooks/limit_check_testing.ipynb
A much simpler version (as shown in my previous comment) would be really easy to implement.
This looks like a really neat way to do it, great idea. Will try to have a more detailed look when I get a chance, let me know if there's anything specific I can do to help.
Return logical arrays indicating whether calculated results fall in valid input ranges (e.g. of temperature, salinity and pressure) given the sets of constants used.
One way to do this:
CO2SYS
outputAnother way would be to build a separate function that just checks validity, rather than adding extra outputs to the equilibrium constant functions and so on. This is less robust but less breaking?