peterwebster / henson

Master data store for the Hensley Henson Journals project, and issue tracker. The application code is kept elsewhere.
1 stars 1 forks source link

Testing ingest of XML in final TEI-compatible markup #93

Closed peterwebster closed 6 years ago

peterwebster commented 6 years ago

Hi @nomoregrapes : as discussed just now, there's a handful of files on OneDrive, Stage 6 Final > Test-040618 If these ingest OK, I'll let you have amended XML for the whole of volumes 14 and 15.

To recap, the things that are different are:

(i) element ID attributes are expressed as

xml:id="123"

(ii) attribute values for <hi are written out in full (iii) <event within <listEvent (iv) <teiHeader inserted

peterwebster commented 6 years ago

@nomoregrapes despite this, it might be prudent to adjust your stylesheet to allow for both sets of <hi attributes (i as well as italic), as I'm not 100% confident that I've been able to deal with all the possible combinations, so there may be some examples of i, b, u that slip through. Less than elegant in coding terms, but prudent under the circumstances.

nomoregrapes commented 6 years ago

(i) is done. Other work should be done, will test with next ingest of vol14.

nomoregrapes commented 6 years ago

@peterwebster the new TEI doesn't work as XML is supposed to have only one root node. We need to wrap the head in a <TEI xmlns="http://www.tei-c.org/ns/1.0"> and that should be TEI compliant but also allow me to access the body tag which is at the same level.

Files would look like...

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="styles.xsl" ?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader> standard tags/title.... </teiHeader>
<body>Actual content <p>markup</p> etc...</body>
</TEI>
nomoregrapes commented 6 years ago

I've put a really bad hack in for now so it works for Julia & the team tomorrow. Volume 14 is in, but we should reingest a version with <tei> tags.

peterwebster commented 6 years ago

Schoolboy error, apologies: this got overlooked. @nomoregrapes

Refining your example, the hierarchy needs to be TEI > text > body tags, as per this. Fixing 14 now

<TEI version="3.3.0" xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>..... </teiHeader>
 <text><body>
   .....
  </body>
 </text>
</TEI>
peterwebster commented 6 years ago

Chapter and verse here http://en.guidelines.tei-c.org/html/ref-TEI.html @nomoregrapes

peterwebster commented 6 years ago

OK @nomoregrapes : fixed XML for volume 14 now in OneDrive, Master Data/Stage6/14v2

Hopefully that will work now.

peterwebster commented 6 years ago

@nomoregrapes XML for volume 15 now also in OneDrive, MasterData>Stage6>15

nomoregrapes commented 6 years ago

@peterwebster I'm not seeing the fixed 14, or XML for 15 on OneDrive. onedrive

peterwebster commented 6 years ago

@nomoregrapes it should be there...?

image

nomoregrapes commented 6 years ago

Ah, found it. Don't know why it wasn't showing yesterday.

nomoregrapes commented 6 years ago

TEI-compatible markup tested and is fine.