tetherless-world / materialsmine

MIT License
2 stars 2 forks source link

Chart Annotator #19

Open mdeagen opened 3 years ago

mdeagen commented 3 years ago

Develop curation tool for annotating charts that operates independently/interoperably at visual, data, and semantic levels.

Structure the annotated metadata in portable, JSON format, that can be loaded alongside a chart image to "resume" curation at any point if given the original image and JSON metadata file. (This involves careful thought on how we can maintain persistent pixel-calibration references within annotations without publishing the original chart image).

Visual annotations (x-axis, y-axis, color) when decoding can populate a Vega-Lite spec for re-encoding the chart. While some of these Vega-Lite templates can be populated automatically, in the end the curator should still have the ability to edit the chart spec. Templates should also be able to be saved and shared (ideally also as a portable JSON file).

The tool would greatly improve curation throughput and allow curation to be sub-divided into well-scoped tasks. The product of the tool would be not only FAIR data in the knowledge graph, but also a FAIR chart that preserves at least some of the original representation (view) of the data.

By saving these charts to our knowledge graph, authors could "fork" a chart to add new elements (such as tooltips, interactivity, etc) that were not part of the original, static chart... an incentive for people to curate data into our system!

Various ML methods will eventually improve upon human ability for some of these tasks (visual/data/semantic). Annotations should be structured with this in mind, because if done well, the collection of curated charts+metadata could be a useful labeled training set for future ML models that automate some of these sub-tasks, while keeping a human in the loop (akin to a weak supervision model).

Edited with task decomposition: