timwis / csv-schema

Analyzes a CSV file and generates database table schema, all within the browser
https://csv-schema.surge.sh
315 stars 30 forks source link

Similar Frictionless Data stuff #4

Open rufuspollock opened 8 years ago

rufuspollock commented 8 years ago

This is great and wondering if you have seen the Frictionless Data tooling for this - I think we could join forces.

To start with there is JSON Table Schema:

http://dataprotocols.org/json-table-schema/

Then there is associated tooling e.g. this JS lib infers a JSON Table Schema from CSV:

https://github.com/okfn/json-table-schema-infer

Here's a user interface that does schema inference along with data package generation in the browser:

http://datapackagist.okfnlabs.org/ https://github.com/frictionlessdata/datapackagist

Then there's stuff that generates relevant SQL etc from JSON Table Schema e.g.

https://github.com/frictionlessdata/jsontableschema-sql-py

/cc @pwalsh

timwis commented 8 years ago

Thanks @rgrp! @harrisj also suggested supporting JSON Table Schema in #2. I'm familiar with the schema (and encouraged the team that built ckanext-dictionary to use it), but I hadn't seen Frictionless Data or all of those tools. Sounds like a neat project, and I'd love to join forces.

It looks like json-table-schema-infer is along the lines of this project's detectType and determineWinner functions in util.js. I can definitely see the value in that being a standalone library that can be plugged into a web interface (and also perhaps used as a CLI). That could then pipe to another module that converts JSON table schema to various types of SQL, as you've done in jsontableschema-sql-py (but in javascript to support the browser).

I'd have to confirm that JSON Table Schema supports things like whether a field is nullable, and custom types like ST_Geometry, but at first glance it sounds like a great idea.

rufuspollock commented 8 years ago

@timwis nullable is supported. ST_Geometry I am not sure about but if not we can think about adding. Current list of supported types is here:

http://dataprotocols.org/json-table-schema/#field-types-and-formats

Note that JTS is extensible in that you could add your own custom JSON properties esp for special SQL type stuff.

timwis commented 8 years ago

@rgrp Yeah I was thinking the interface would allow you to select "Other" as a field type and key in your own custom value. I'll look into how that would be stored in JTS.

Regarding null, maybe I'm misreading but it seemed like with JTS you'd have to say that the "field type" is "null" rather than "it's a string that can be null" like you would in SQL?

timwis commented 8 years ago

Oh, I see - "nullable" would be the required constraint

rufuspollock commented 8 years ago

@timwis that's exactly right.