Closed domoritz closed 8 years ago
dl.type.infer
assumes an array as input. If you give it a string, it will test each character separately, hence the behavior you're seeing. If given an array, the inference works as expected:
> dl.type.infer(['100', '3.14', '1e5'])
"number"
However this also suggests that non-array inputs should either be auto-boxed into arrays or result in an error. I'm sure others will run into this problem!
As for commas, the JS number cast does not support it (+'1,000'
-> NaN
). We currently use that as part of our number checking and parsing routines. Note that parseFloat is not much better (Number.parseFloat('1,213')
-> 1
), so that does not seem worthwhile as a replacement.
I noticed that I made a mistake in the csv parse example.
dl.csv({url: 'https://dl.dropboxusercontent.com/u/12770094/simple.csv'}, {parse: 'auto'})
does in fact do the right thing and parses numbers. I accidentally left a ,
in the data.
This sounds like a great compromise. Parse arrays of numbers with .
such as 100.32
correctly but fail to parse an array with numbers like 100,000.42
and require the tool that uses data lib to remove them first. For polestar/voyager this means that we ask users to clean up their data first for now.
Datalib can parse some numbers such as
But infer does something different
So datalib can parse something as a number although infer does not infer it to be a number (this is understandable).
I'm aware that it is a bit problematic to parse
.
and,
since100,000
could be either 100000 or 100 but maybeinferAll
can make a smarter prediction.Messytables does those predictions very aggressively and I'm not sure that we want to go that far. However, we should be at least able to parse csv with correctly.
parses as strings unless you force it
I'm mostly asking for advice whether datalib should handle this case at all or whether I have to change my code to replace
,
or.
by hand or even write a smart type prediction.