sunlightlabs / parmenides

The Poor Man's Record Linkage Service
http://en.wikipedia.org/wiki/Record_linkage
Eclipse Public License 1.0
6 stars 2 forks source link

Prismatic/Schema validation/transformations/descriptions #4

Open zmaril opened 9 years ago

zmaril commented 9 years ago

This is a one man project right now, so there is a need to lean on tools for automation of work. Integrating data sources into databases, validating sources, transforming the data into the correct format from outside, all of these are simple tasks but are error prone. A simple typo is easy to miss, especially when dealing with transformations of property names. So, I need to find a way to automate away all these worries.

One of these ways is to adapt prismatic/schema for the needs. It should be easy to write code for validation of incoming data, as well as possible to coerce outside data into the accepted formats. The biggest issue would be making a way to annotate those schemas with information that allows for easy creation of schemas for datomic.

boblannon commented 9 years ago

why isn't validictory suitable for this?

boblannon commented 9 years ago

ah, this is in clojue, i see. carry on

zmaril commented 9 years ago

@paultag @jamesturk Is there a finite number of options for the classifications property? I looked at the open civic data docs and couldn't seem to find an answer.

paultag commented 9 years ago

@zmaril yeah, but they varry by type (and we're pretty liberal about adding to the set) - I'll have to dig up our defs, they should at least be on the OCDEPs

zmaril commented 9 years ago

Validation is mostly done at this point. There are some incremental things (like expanding enum's for classifications) and refining the general string types into things like URL types that are simple changes. I've learned how to integrate new data schema quickly and am not too worried about the new ocd types.

zmaril commented 9 years ago

Putting this aside for now. Validators are about as good as they get (with the data I have). Will update as needed but not actively working to better them.