jllodra / ncdump-json

Modified ncdump to output data in json format
85 stars 28 forks source link

Avoid using '#' in float format specifier for float attributes for valid JSON #21

Closed dougvj closed 2 years ago

dougvj commented 2 years ago

Recently we encountered a data set that produced output values on variable attribute floats with trailing decimals, for example:

"resolution":1000000.,

This not valid JSON and broke parsing.

I was able to recreate the issue with the test data set in the repo, under tests:

../ncdump-json -j -h -p 1 ./sresa1b_ncar_ccsm3-example.nc

Produces this in the output:

"missing_value":1.e+20

~This PR fixes the problem by detecting the case where a . is followed by a non digit and adds a single 0.~

Now fixed by using the standard variable format specifier which does not have '#'

jllodra commented 2 years ago

Thank you for your contribution, nice executed.

Indeed, that is not valid in JSON (just for the sake of curiosity, let me say that it would be valid as a JavaScript object).

Let me ask you something. I see that you are adding a trailing 0, for example 1.e+20 => 1.0e+20, if I have undestood correctly.

  1. => 10000.0

Wouldn't be an option to remove the . when it is not followed by a number?

jllodra commented 2 years ago

After reading your code comment about the # flag being the culprit. (https://en.wikipedia.org/wiki/Printf_format_string)

In https://github.com/jllodra/ncdump-json/blob/master/src/dumplib.c#L109 you could define also the variables "json_float_att_fmt", "json_float_attx_fmt", and "json_double_att_fmt", with the same format specification but removing the #.

Then, in https://github.com/jllodra/ncdump-json/blob/master/src/ncdump.c#L447, you could write if (is_json) then do the snprintf using the json-specific format, else do what it was doing before.

What do you think about this approach?

dougvj commented 2 years ago

I thought it might be desirable to maintain the float type rather than produce a JSON int type in the case that scientific notation is not produced. However, for our needs this doesn't matter and I would be happy to switch this to dropping the decimal instead if you prefer. 

dougvj commented 2 years ago

Yes this approach would be fine too. 

jllodra commented 2 years ago

If you are OK with it (and also if it solves your problem), I think the best approach would be the latter (dropping the # for json). For sake of code simplicity, and would avoid manipulation of text

dougvj commented 2 years ago

OK thanks for the feedback. I pushed this change and edited the title to reflect.

jllodra commented 2 years ago

Excellent, thank you for your contribution

https://github.com/jllodra/ncdump-json/releases/tag/r17