Open huddlej opened 1 year ago
As a user, I would prefer to have validation throw an error when it finds NaN values instead of having measurements quietly replace the NaNs with empty strings, since this validation step would let me catch a data issue earlier.
Me too! NaN
is not valid JSON (technically) so we should avoid it for that reason too. Looks like if we use json.dumps(... ignore_nan=False)
then this example would throw a ValueError
(which we can catch...)
Ah, maybe we should standardize how we handle JSONs with the new augur.io.json module added in the augur curate
work.
Current Behavior
It is possible to create a measurements JSON where one or more data values are
NaN
, causing Auspice to fail to load the measurements JSON with the following errors:However, the corresponding measurements JSON validates with Augur without any errors:
The Augur validate passes because the Python JSON parser accepts and parses the
NaN
values.Expected behavior
Ideally,
augur validate measurements
would throw an error when it encounters syntax that is not supported by the JS JSON parser in Auspice.How to reproduce
Create a measurements table like the following (from
tests/functional/measurements_export/collection.tsv
) where the third row has a missing value in thevalue
column:Run a minimal measurements export as with this functional test, using the TSV above:
This command should produce an error when it encounters the missing value that gets rendered as
NaN
in the exported JSON.Possible solution
There are a couple of possible solutions to consider:
NaN
. Since we use the pandas DataFrame.to_dict method when we export the measurements themselves, we could fill missing values with an empty string or something more appropriate prior to calling that method.NaN
values. This logic could maybe be implemented at the JSON schema level or at theload_json
step of the validation by callingjson.load
with more restrictive settings.As a user, I would prefer to have validation throw an error when it finds
NaN
values instead of having measurements quietly replace theNaN
s with empty strings, since this validation step would let me catch a data issue earlier.