Datasets try to encode NaN values, which is not supported by JSON

JobLeonard commented 8 years ago

So I was trying to test both datasets today, but the second one gave me:

SyntaxError: Unexpected token N in JSON at position 2880736

After downloading and investigating both JSON files, it turns out the broken file contains NaN fields, which are not supported by the JSON format. (I find this bug a bit strange to begin with, since we're using a built-in method to turn the data into JSON - you'd think the json.dumps(fileinfo) would throw an error or at least give a warning for trying to convert invalid data to JSON)

Anyway, I don' t know in what way the data uses NaN, but assuming that isn't a bug to begin with we need to figure out some work-around. The solution depends on whether it's important to distinguish NaN from other numerical values.

if distinguishing NaN from 0 does not matter, convert everything to 0. This has my preference, because lists with multiple types require boxed values, slowing things down, so keeping everything as a numerical value has speed and memory advantages.
if distinguishing from numerical values is important, replace it with either:
- null (can be tested for with ===)
- a string that reads "NaN" (not recommended)

Also, there's multiple places we could fix this on either server- or client-side. On the server we'd could replace the NaN values before encoding (or perhaps even in the dataset itself, if NaN does not mean anything).

Otherwise we can simply apply a regex to the JSON string before sending it, or after receiving it. For example, on the client-side we could do something like:

let result = JSON.parse(dataSetString.replace(/\bNaN\b/g, "0"));

How shall we handle this?

slinnarsson commented 8 years ago

I think we should ban NaNs and let it be up to the creator of the loom file to decide if they should be zero, -1 or something else.

I'll add a check for nan in loompy and throw an exception to force them out.

Where exactly was the NaN? Which file, which attribute?

Sten

Skickat från min iPhone

11 aug. 2016 kl. 11:34 skrev Job van der Zwan notifications@github.com<mailto:notifications@github.com>:

So I was trying to test both datasets today, but the second one gave me:

SyntaxError: Unexpected token N in JSON at position 2880736

After downloading and investigating both JSON files, it turns out the broken file contains NaN fields, which are not supported by the JSON formathttp://stackoverflow.com/questions/18071379/syntaxerror-unexpected-token-n-in-chrome-console-from-angularjs. (I find this bug a bit strange to begin with, since we're using a built-in method to turn the data into JSONhttps://github.com/linnarsson-lab/Loom/blob/7bc71cc5ba9213a88f995698d9a40a0bcbae6fb0/python/loom_server.py#L92-L111 - you'd think the json.dumps(fileinfo) would throw an error or at least give a warning for trying to convert invalid data to JSON)

Anyway, I don' t know in what way the data uses NaN, but assuming that isn't a bug to begin with we need to figure out some work-around. The solution depends on whether it's important to distinguish NaN from other numerical values.

if distinguishing NaN from 0 does not matter, convert everything to 0. This has my preference, because lists with multiple types require boxed values, slowing things down, so keeping everything as a numerical value has speed and memory advantages.
if distinguishing from numerical values is important, replace it with either:
- null (can be tested for with ===)
- a string that reads "NaN" (not recommended)

Also, there's multiple places we could fix this on either server- or client-side. On the server we'd could replace the NaN values before encoding (or perhaps even in the dataset itself, if NaN does not mean anything).

Otherwise we can simply apply a regex to the JSON string before sending it, or after receiving it. For example, on the client-side we could do something like:

let result = JSON.parse(dataSetString.replace(/\bNaN\b/g, "0"));

How shall we handle this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/39, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag2o8SD2ByLWsc-sWaruawU89gxI1ks5qeuyJgaJpZM4Jh8iy.

JobLeonard commented 8 years ago

Obviously, I'm perfectly fine with not having to worry about NaN altogether ;).

The error was in "Oligodendrocytes_Science_2016" data set, in colAtts, sub-field (Oligodendrocytes_Science_2016)_TranscriptID. Whole array of NaN's.

slinnarsson commented 8 years ago

Revised loompy to reject numerical attributes that contain NaNs.

linnarsson-lab / loom-viewer

Datasets try to encode NaN values, which is not supported by JSON #39