open-editions / corpus-joyce-finnegans-wake-tei

James Joyce's novel Finnegans Wake, encoded in TEI XML.
corpus-joyce-finnegans-wake-tei-git-master.open-editions.vercel.app
GNU General Public License v3.0
2 stars 2 forks source link

Parse matches into TEI format #6

Open JonathanReeve opened 4 years ago

JonathanReeve commented 4 years ago

This will require thinking of a way to format the matches. It'll most likely have to be in a standoff format (in a separate file) to avoid overlapping XML tags.

JonathanReeve commented 4 years ago

Something like this maybe? This is a harder problem than I'd imagined it would be.

<biblStruct xml:id="#id-in-criticism">
  ...
</biblStruct> 

<standOff type="textMatching">
  <link target="#id-in-FW #id-in-criticism">
    <ptr xml:id="#id-in-FW" xpointer="string-range(start, end)"> 
    <ptr target="#id-in-criticism" xpointer="string-range(start, end)"> 
  </link>
</standOff>
JonathanReeve commented 4 years ago

I think for string-range() it should probably reference the line xml-id: <l xml:id="L1.1.16.28"> and then the word. This will require converting the character offset to lineNo-word.

JonathanReeve commented 4 years ago

I'm going to check in with the TEI listserv people about the formatting of this.

JonathanReeve commented 4 years ago

My post to the TEI-L here got a few useful responses.

JonathanReeve commented 4 years ago

Going forward with this structure:

<?xml version="1.0" encoding="utf-8"?>
<standOff type="textMatching">

  <listBibl>
    <biblStruct xml:id="id-in-criticism">
      <!-- bibliographic stuff here -->
    </biblStruct>
  </listBibl>

  <linkGrp>
    <!-- for each match -->
    <link target="string-range(#id-in-FW, start, end) string-range(#id-in-criticism, start, end)" />
    <!-- end for -->
    </linkGrp>
</standOff>