Open yusefnapora opened 8 years ago
Some other JSON translators and related work to potentially look at:
http://goessner.net/articles/jsont/ http://ajaxian.com/archives/transforming-json https://github.com/bazaarvoice/jolt https://www.p6r.com/articles/2008/05/06/xslt-and-xpath-for-json/ https://www.w3.org/TR/xslt-30/#json http://jsoniq.org/
and, of course, jq
Other option is to use https://newville.github.io/asteval/, which is already used by tg in a couple of places
After considering the full range of work facing us, I'm going to deprioritize this for the near future, because it's a pretty complex undertaking with high expressiveness and security requirements. Let's consider translators "use at own risk" for the moment.
Unless:
Starting this issue for discussion about a declarative translator DSL design. The dynamically-loaded python modules we've got now get the job done, but are vulnerable to malicious or accidentally-damaging code execution.
Ideally we want a DSL for extracting and transforming the data we care about from its native format into a collection of mediachain records. To avoid RCE issues, an "external DSL" is preferred to something embedded inside a general-purpose host language.
Features I'd like to see:
thing.role == artist
, etc)Ideally, we want something that doesn't "feel like programming", although that may be unavoidable to some extent.
Implementation thoughts...
A while ago I looked at Xtext, a framework for creating DSLs and generating an object model from them. It's an extremely java-centric solution, but there is a python clone called textX that could be interesting. It has a very similar grammar and generates a graph of python model objects from the DSL input. It doesn't have some of the fancier features (like generating an IntelliJ plugin or web-based editor for your language). But it seems nicer than rolling our own parser, etc...
An interesting javascript project I ran across during earlier research: http://defiantjs.com/ - converts JSON to/from XML and uses XPath for query and filtering. XPath is very flexible, and could be a decent choice for field selection. We could use the same idea and convert from json to xml for query / extraction using python's
xml.etree.ElementTree
classes, which support XPath queries.Here's what a getty translator might look like with XPath style selectors:
One thing that stands out is that the XPath selectors can potentially match multiple fields in the input, so we'd either need to consider cardinality per field (e.g.
title
is a single string, butkeywords
is a list), or else just say everything is a list and can have multiple values.Anyway, these are just some thoughts that have been rattling around my head for a while. I figure it's worth considering what our ideal DSL would look like before we start diving into anything :)