jbenet / transformer

transformer - multiformat data conversion
transform.datadex.io
130 stars 7 forks source link

Why not "as a service"? #1

Closed blambeau closed 10 years ago

blambeau commented 10 years ago

@jbenet I was thinking about all this data stuff today. I think that it might be great, in addition to the pandat conversion system itself, to build an online platform that would allow people to

1) register transformation scripts (using the pandat system itself) 2) register third-party web services (not necessarily using pandat) 3) invoke specific conversions.

A very simple use case would be as follows: say you want to convert a .csv file to .json (following dataprotocols conventions, say). You could do something like:

curl -X POST -H "Accept: application/json" -H "Content-Type: text/csv" -d @mydatafile.csv http://pandat.org/converter

I'm not so sure if I understand your aim correctly (?). Also, I'm not sure how media-types + file formats + dataprotocols + possibly Finitio + possibly JSON-LD would all fit together. But anyway, I would find much interest in such an online conversion hub.

Btw. my brother asked me whether it would be possible to parse a PostgresSQL DDL file and convert it to a Finitio schema:

curl -X POST -H "Accept: application/finitio" -H "Content-Type: text/sql" -d @myschemadump.sql http://pandat.org/converter

EDIT: completed curl invocations

jbenet commented 10 years ago

@blambeau good ideas.

1) register transformation scripts (using the pandat system itself)

Yeah, pandat will have a way to publish Types (jsonld schemas), Codecs (encode/decode functions), and Conversions (Type -> Type conversion functions. + Inversion if possible).

2) register third-party web services (not necessarily using pandat)

I could totally see web services doing some complex conversions, or running pandat on large datasets for you. See https://github.com/maxogden/gut -- will probably do something similar/plugin to gut. My goal is to make as much as possible runnable locally (download code + schemas), as tools are better than services (in that tools are more general and independent. Plus services can be built on top of tools). That said, being able to list pandat conversions as services will plug in very nicely with our thoughts for dat (including web hooks between dat instances, etc).

3) invoke specific conversions.

By specific conversion here do you mean existing third party tools? If so, that will be doable from within the Codec and Conversion modules (they'll just be npm js modules :) )

curl -X POST -H "Accept: application/json" -H "Content-Type: text/csv" -d @mydatafile.csv http://pandat.org/converter

Yep! See https://github.com/maxogden/gut -- will think more about this once pandat works for the common things.

Also, I'm not sure how media-types + file formats + dataprotocols + possibly Finitio + possibly JSON-LD would all fit together.

I've narrowed quite a bit of this down. See the readme, and particularly https://github.com/jbenet/pandat/#pandat-formats----an-example -- things are not fully specified there yet, but the idea is to make Types (speced with a jsonld schema), Codecs (module, with code + jsonld spec), and Conversions (module, with code + jsonld spec). Finitio could fit in really well into the pipeline, by allowing validation in the Types. Will think about this more, but what I'm thinking of is being able to have a 1to1 mapping between pandat jsonld schemas and a corresponding finitio schema, for validation.

Btw. my brother asked me whether it would be possible to parse a PostgresSQL DDL file and convert it to a Finitio schema:

Yeah, we should be able to do that -- as long as we can define a clear mapping between the languages. Doesn't have to be 1-to-1 (could be a oneway conversion), but would be nice.

blambeau commented 10 years ago

Great! Thanks for the explanation, and the link to gut which is worth knowing too.

I close the issue, otherwise it will probably remain open for months with more noise than signal wrt to pandat's core business.