vega / datalib

JavaScript data utility library.
http://vega.github.io/datalib/
BSD 3-Clause "New" or "Revised" License
731 stars 133 forks source link

Datalib is inferring my string as a number #95

Open ferndot opened 6 years ago

ferndot commented 6 years ago

Given the following TSV file, datalib is inferring the name column to be a number.

Example TSV:

owner_slug  slug    aggregate   name    square_footage
company_demo    1bf8caed-89d0-4547-b1f9-feac7d72e91b    TRUE    Restaurant 11057    3000

Datalib call:

datalib.tsv(
  {
    url: 'example.tsv'
  },
  function (error, data) {
    if (error) {
      console.log(error)
    } else {
      console.log(data)
    }
  }
)
jheer commented 6 years ago

Thanks for the bug report. When I attempt to reproduce, I find that the type inference methods are inferring the name column to be a date, for which a timestamp number is then produced.

Strangely enough, the browser's built-in Date.parse method (at least on Chrome and in Node.js) successfully parses the example string value to a date:

new Date(Date.parse('Restaurant 11057'))
// Thu Jan 01 11057 00:00:00 GMT-0800 (PST)

Fixing this will likely require significant changes to how Date inference is performed (as we currently leverage the results from Date.parse). In the meantime, I recommend explicitly providing the desired column types to datalib rather than relying on type inference.

ferndot commented 6 years ago

@jheer: we could easily fix this by using Moment.js. Here is a very simple example: http://jsfiddle.net/zcvxsbo2/2/. We could also see if a more modern and small library like date-fns or d3-time-format (which is already included), would work.

This would also make the date parser more robust, consistent, and able to support more formats.

I can provide a patch if you'd like 😄

jonathanzong commented 3 years ago

Hi! I just wanted to see if there had been any changes to Date inference since this discussion. In Lyra, we've been loading datasets from vega-datasets through datalib and noticing a few incorrect type inferences to do with date. If there's currently no plans to revisit this issue I can potentially look into it at some point, but want to make sure I'm not duplicating the work of someone more familiar with this library first.