jbenet / transformer

transformer - multiformat data conversion
transform.datadex.io
130 stars 7 forks source link

simple example of conversion functions #2

Open jbenet opened 10 years ago

jbenet commented 10 years ago

not final, food for thought.

Want output

{
  'name': 'Juan Batiz-Benet'
  'city': 'San Francisco, CA'
}

Input Type FOO

{
  'name': {
    '@type': 'pandat/name',
    'label': 'NAME',
    'codec': 'pandat/name-last-name-first'
  },
  'addr': {
    '@type': 'pandat/us-street-address',
    'label': 'ADDR',
  }
}

Output Type BAR

{
  'name': 'pandat/name',
  'city': 'pandat/us-city',
}

You can write a conversion function, use it and/or publish it to pandat:

(excuse this interface, might be simplified some)

var Foo2Bar = pandat.Conversion({'invertible': 'false'}, [Foo], [Bar]);

Foo2Bar.convert = function(foo) {
  return {
    'name': pandat(Foo.name['@type'], Bar.name['@type'], foo.name),
    'city': pandat(Foo.addr['@type'], Bar.city['@type'], foo.addr)
  }
}

Or, pandat might be able to generate the function, with some hints about how the names map to each other. (not quite sure what the right interface is here, but will think about it.)

yoshuawuyts commented 10 years ago

I'd like to see something more along the lines of this:

Source:

{ 
  "name": {
    "type": "pandat/name-last-name-first",
    "label": "NAME"
  },
  "city": {
    "type": "pandat/us-street-address",
    "label": "ADDR"
  }
}

Output:

{ 
  "name": {
    "type": "pandat/name",
    "label": "NAME",
    "source": "NAME"
  },
  "addr": {
    "type": "pandat/us-city",
    "label": "CITY",
    "source": "ADDR"
  }
}

Converter:

/**
 * Module dependencies
 */

var object1 = require('./object1.json')
var fooSchema = require('./foo.json');
var barSchema = require('./bar.json');
var pandat = require('pandat');

/**
 * Initialize converter.
 *
 * @param {Object} sourceSchema
 * @param {Object} targetSchema
 * @return {Function}
 */

var converter = pandat.Conversion(fooSchema, barSchema, {invertible: false});

/**
 * Execute conversion
 */

var resultObject = converter(object1);

I think that if you design your relations beforehand, there'll be no need for further declarations. Such an implementation would allow for more flexibility, and result in a cleaner API.

Also: I didn't quite catch the difference between a codec and conversion, could you explain what you mean by that?

jbenet commented 10 years ago

Hello!

I think that if you design your relations beforehand, there'll be no need for further declarations. Such an implementation would allow for more flexibility, and result in a cleaner API.

Yeah! What i meant above by "pandat might be able to generate the function, with some hints about how the names map to each other".

"source": "NAME"

I like this relational mapping, though it won't quite happen on the output type, as the output type may be an input type elsewhere. Relevant to mention here is that users will be reusing types published by others. Totally possible to just have to specify:

var converter = pandat.Conversion('jbenet/foo', 'jbenet/bar')

Given I published foo and bar schemas :)

I didn't quite catch the difference between a codec and conversion, could you explain what you mean by that?

Yeah. A Codec is a named pair of functions to encode and decode between raw data and typed objects. For example, see https://github.com/jbenet/pandat/blob/master/stdlib/json_codec.js and https://github.com/jbenet/pandat/blob/master/stdlib/xml_codec.js (these are just examples, nothing works yet). Codecs don't have to be as general as json or xml. They can be type-specific. See https://github.com/jbenet/pandat/blob/master/stdlib/date_type.js#L23-L35 (again nothing works yet, there's errors there :] ). Codecs can be published and installed (npm modules).

A Conversion is a function converting between two types. The example above shows converting between Foo and Bar. While it's certainly possible to generate conversion functions from relations (inferred based on the types, or specified with source/target keys), many conversion functions will be complex and require programming. These would be publishable/installable modules as well.

Lmk if that makes sense? Will put this all on the Readme.

yoshuawuyts commented 10 years ago

Your explanation of Codec makes sense. But before I start suggesting any changes, let me check if I understood it correctly:

A conversion has a:

An input schema has:

An output schema has:

Or outputSchema == inputSchema? Let me know if this sounds about right.

jbenet commented 10 years ago

Output schema == input schema. They're the same thing. They define Types. Types can be used as inputs or outputs in a conversion.

Other than that, right on!

yoshuawuyts commented 10 years ago

I really dislike the @something syntax. I don't think keys should be namespaced if they're not used outside pandat/transform.

And couldn't this:

/**
 * Module dependencies
 */

var outputSchema = require('./bar');
var inputSchema = require('./foo');
var linkSchema = require('baz');
var pandat = require('pandat');

/**
 * Initialize converter.
 */

var Foo2Bar = pandat.Conversion({'invertible': 'false'}, [inputSchema], [outputSchema]);

Foo2Bar.convert = function(linkSchema) {
  return {
    'name': pandat(inputSchema.name['@type'], outputSchema .name['@type'], linkSchema.name),
    'city': pandat(inputSchema.addr['@type'], outputSchema .city['@type'], linkSchema.addr)
  }
}

be rewritten to this:

/**
 * Module dependencies
 */

var outputSchema = require('./bar');
var inputSchema = require('./foo');
var linkSchema = require('baz');
var pandat = require('pandat');

/**
 * Export converter.
 */

module.exports = var converter = pandat({'invertible': 'false'});

converter.schema = {
  'name': [inputSchema.name, outputSchema.name, linkSchema.name],
  'city': [inputSchema.addr, outputSchema.city, linkSchema.city],
}

You could use an internal function to execute converter.schema. Not sure if closures are passed around correctly though.

The less friction the API causes, the more developers will love using it. Imo things like @type should be evaded. What do you think?

jbenet commented 10 years ago

I really dislike the @something syntax.

Take that up with json-ld.org :)

I don't think keys should be namespaced if they're not used outside pandat/transform.

They are, the goal is for all transformer objects to have a definition in JSON-LD. (sorry, haven't made it clear in the REAMDE.) They'll have their own @context, etc. The trick is that the library can fill in a lot of the standard stuff, so:

(s/pandat/transformer/ in your mind here)

t = pandat.Type({
  'name': {
    '@type': 'pandat/name',
    'label': 'NAME',
    'codec': 'pandat/name-last-name-first'
  },
  'addr': {
    '@type': 'pandat/us-street-address',
    'label': 'ADDR',
  }
})

fill's in:

> t.src
{
  '@context': 'http://pandat.io/context/pandat.jsonld',
  '@type': 'Type',
  'codec': 'pandat/identity-codec',
  'schema': {
    'name': {
      '@type': 'pandat/name',
      'label': 'NAME',
      'codec': 'pandat/name-last-name-first'
    },
    'addr': {
      '@type': 'pandat/us-street-address',
      'label': 'ADDR',
    }
  }
}

See https://github.com/jbenet/pandat/blob/master/js/type.js

Though none if this is final. Will try to have working code by end of this weekend.

max-mapper commented 10 years ago

just for the sake of argument, how about this for a minimum viable JSON type:

t = pandat.Type({
  'name': {
    'type': 'name',
    'label': 'NAME',
    'codec': 'name-last-name-first'
  },
  'addr': {
    'type': 'us-street-address',
    'label': 'ADDR',
  }
})

e.g. default type to @type if @type doesn't exist (agreed that @ symbols in keys are weird) and default all types to pandat/ if no other 'namespace' is specified

jbenet commented 10 years ago

As for the example, the goal is that most users won't have to write their own conversion functions at all, simply use published ones. Some people will, and in those cases, both doing it in code directly or with a relational schema (expressing the mapping of one type to the other) that allows transformer to generate the code. Precisely like you suggest! :)

You could use an internal function to execute converter.schema

:+1:

jbenet commented 10 years ago

just for the sake of argument, how about this for a minimum viable JSON type:

Yeah! lgtm! both filling in the @ and default namespace. If we run into problems, figure it out then.

yoshuawuyts commented 10 years ago

:+1:

jbenet commented 10 years ago

Turns out the @context can symlink type -> @type :+1:

@id is not required for a valid JSON-LD document. Also note that you can alias "@id" to something less strange looking, like "id" or "url", for instance.

From https://github.com/dataprotocols/dataprotocols/issues/110#issuecomment-41442675