kuhumcst / glossematics

The life of Louis Hjelmslev.
https://glossematics.dk
4 stars 1 forks source link

Generate triples from TEI XML documents #18

Closed simongray closed 2 years ago

simongray commented 3 years ago

Regardless of which database is eventually going to be used (see: #4) it 99% likely that I will be using a triplestore of some kind. There is functionality available in Cuphic (see: https://github.com/kuhumcst/cuphic/issues/1) to facilitate this, although it may have to be tweaker further.

I now have access to the university's "N drive" where it should be possible to find sample data. The task is now to recursively go through each document in a list of documents and return metadata valid triples. These triples should be derived from both the actual metadata in the TEI header, as well as metadata in the contents + possible implied metadata that can be derived from the content, e.g. the presence of certain words or some other feature.

simongray commented 3 years ago

Currently I need

simongray commented 3 years ago

New unified XML parser in Cuphic: https://github.com/kuhumcst/cuphic/commit/40a32dd302fbde9e4b6334c7c120ebad32b83922

This is now the common platform from which both frontend UI is generated and backend database metadata is to be extracted.

simongray commented 2 years ago

Branch to track this issue: https://github.com/kuhumcst/glossematics/tree/feature/18-triples

simongray commented 2 years ago

Work is progressing, but the current iteration of Cuphic cannot match elements that are a generic descendant of some ancestor. This is quite often necessary. Satisfying this requirement means developing new DSL grammar as well as implementing some sort of path memory.

simongray commented 2 years ago

Now tracked in https://github.com/kuhumcst/glossematics/compare/feature/18a-triples

simongray commented 2 years ago

Working search endpoint and frontend as of d35b3c3.

This issue will remain until I am sourcing an adequate amount triples from the TEI documents, however the current work is focused on the search frontend. Using the feature/18a-triples branch for this work doesn't seem right I will make a new branch instead.

simongray commented 2 years ago

Work is ongoing, but much of the really important stuff has essentially been added as of 15a7b1a805e75fda4e21bba7de25f73042307a44.