Closed jacobwegner closed 2 years ago
@jtauber and I took a look at some of the underlying XML for Cunliffe and LSJ and have pivoted a bit on this.
There is variance between the two dictionaries in how bibl
and cit
elements are nested. There are also parts of the LSJ which seem to be missing sense
elements from the markup.
(I can try and come back to this comment later and excerpt them).
We think a good "middle of the road" approach for LSJ (which will also benefit parts of Cunliffe too) is to:
bibl
elements (so we can still resolve entries / senses that apply to a given passagebibl
elements from Scaife Viewer
Been working with a sort of "playground" locally to experiment with the XSL transformations for LSJ.
I'm working to approximate the HTML from Logeion; Logeion has a lot of markup differences that we know we won't have in https://github.com/PerseusDL/lexica.
Those links resolve to "work-level" URNs on catalog.perseus.org.
https://gist.github.com/jacobwegner/7c82e85201ea99365cb1528ae8b506bf#file-lsj-aeido-html
Still have a few white-space issues to resolve.
I am also going to punt on sense extraction for this first pass.
1) headwords and blobs 2) headword, definition blob, sense blob 3) citations from definition and sense (allowing expansion)
I have the blob extraction (two entries only) deployed now:
@jtauber and I did a good first pass over markup / betacode issues; we do want to discuss a couple of underlying XML issues on today's call; this Gist has entries for ἄειδε
:
https://gist.github.com/jacobwegner/0937195b81f6e13a31cde473987b936c
@gregorycrane I think I've made enough progress on normalizing the headwords for LSJ to consider it "good enough" for release to the wider site.
Here's an example below for urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1@θεά
:
The headword defined in the Cunliffe XML is θεά
:
But in LSJ the headword is θεά1
:
We normalize the differing alpha variants (and strip the 1
from the LSJ entry) so that both entries are resolved for that lemma.
@gregorycrane One last thing I'll leave here (as it is LSJ-specific)
From an LSJ entry for μῆνις
:
https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1-1.7?entryUrn=urn%3Acite2%3Ascafife-viewer%3Adictionary-entries.atlas_v1%3Alsj-67481
If a user clicks an Iliad or Odyssey reference (e.g. Il. 5.34
), we'll load the passage in beyond translation in a new window / tab:
https://beyond-translation.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:5.34
For all other references (e.g. Or. 22.265d
), we'll load the work entry on catalog.pereus.org:
https://catalog.perseus.org/catalog/urn:cts:greekLit:tlg2001.tlg022
(with the idea that we can improve this "resolution" down the line to resolve to scaife.perseus.org or Perseus 4 if we have a matching edition)
This has been released to production via v2022-05-18-001
Partial ingestion is visible on beyond-translation-gagdt-dev.
Need to flesh out concrete steps to properly ingest entries, improve formatting, handle nested senses, etc.