petermr / openDiagram

Extaction of semantic data from diagrams in scientific and other technical/business documents
Apache License 2.0
1 stars 5 forks source link

openDiagram

Extraction of semantic data from diagrams in scientific and other technical/business documents.

Overview

In many documents the diagrams are a key component of the information. Data are created in semantic form and output as machine readable files and then, kin one of the great barbarism of this century are trashed into bitmaps futher degraded by JPEG technology. This lost data leads to irreproducible science and in the worst cases people die. (Clinical trials are often published as PDF and data extraction is hard or near impossible.)

This project tackles the impossible - reconstituting semantic data for the world - "turning hamburgers into cows".

Among the subjects I have successfully extracted semantic data from:

Many of these have common semantic diagrammatic abstractions and AMI builds these up using heuristics.

preprocessing with ami

see PREPROCESS.md

creation of project


`