Closed simongray closed 2 years ago
Currently I need
New unified XML parser in Cuphic: https://github.com/kuhumcst/cuphic/commit/40a32dd302fbde9e4b6334c7c120ebad32b83922
This is now the common platform from which both frontend UI is generated and backend database metadata is to be extracted.
Branch to track this issue: https://github.com/kuhumcst/glossematics/tree/feature/18-triples
Work is progressing, but the current iteration of Cuphic cannot match elements that are a generic descendant of some ancestor. This is quite often necessary. Satisfying this requirement means developing new DSL grammar as well as implementing some sort of path memory.
Working search endpoint and frontend as of d35b3c3.
This issue will remain until I am sourcing an adequate amount triples from the TEI documents, however the current work is focused on the search frontend. Using the feature/18a-triples branch for this work doesn't seem right I will make a new branch instead.
Work is ongoing, but much of the really important stuff has essentially been added as of 15a7b1a805e75fda4e21bba7de25f73042307a44.
Regardless of which database is eventually going to be used (see: #4) it 99% likely that I will be using a triplestore of some kind. There is functionality available in Cuphic (see: https://github.com/kuhumcst/cuphic/issues/1) to facilitate this, although it may have to be tweaker further.
I now have access to the university's "N drive" where it should be possible to find sample data. The task is now to recursively go through each document in a list of documents and return metadata valid triples. These triples should be derived from both the actual metadata in the TEI header, as well as metadata in the contents + possible implied metadata that can be derived from the content, e.g. the presence of certain words or some other feature.