papyri / sosol

The Son of Suda On Line
GNU General Public License v3.0
15 stars 13 forks source link

xml:id disappearing in editor #214

Open HolgerEssler opened 4 years ago

HolgerEssler commented 4 years ago

We had created idno-Tags for the Herculean papyri in the header with an xml:id for each fragment, so that we could link to it from the text fragments with @corresp. These now seem to disappear in the editor and in course of finalization, even if they were in the xml until the end. An example is: https://github.com/papyri/idp.data/blame/master/DCLP/63/62400.xml#L276 There I changed something a few days ago, now all xml:id are replaced by @n. You can see the old ones, if you go back behind the change from July 3rd. The mechanism may be observed by opening a PHerc. in the editor (e.g. http://papyri.info/dclp/62382). Just opening the data in the metadata-editor and saving will change the xml:id on //msDesc/msIdentifier/idno/idno to @n. Thus, if anyone opens a papyrus with this information in the editor, inserting corrections will result in the loss of our reference system. I consider this a problem that should be fixed, preferably also undoing the losses that already occurred. If the latter is difficult, I can try to reinsert the xml:ids.

Edelweiss commented 4 years ago

the erroneous version of the file

https://github.com/papyri/idp.data/blob/98241d4c3cdd116fab79b22e5b2bbf6b1decfb87/DCLP/63/62400.xml

jcowey commented 4 years ago

https://github.com/papyri/idp.data/blob/98241d4c3cdd116fab79b22e5b2bbf6b1decfb87/DCLP/63/62400.xml#L276-L523

should be compared to

https://github.com/papyri/idp.data/blob/master/DCLP/63/62400.xml#L276-L523

Edelweiss commented 4 years ago

To examine the error, I called up the file in the SoSOL editor and edited and saved it two times, one time using the DCLP metadata xml editor and one time using the DCLP text xml editor, neither of which produced the error described above.

Bildschirmfoto 2020-07-24 um 08 45 37

Edelweiss commented 4 years ago

Trying to open and save the file using the DCLP Leiden+ editor didn’t work from the beginning, because the editor wouldn’t accept the @corresp attribute in the div attribute in the edition part.

<div xml:lang="grc" type="edition" xml:space="preserve">
<div n="1" subtype="column" type="textpart" corresp="#FR1740"><ab>
…

Example

<div xml:lang="grc" xml:space="preserve" type="edition"><div corresp="#FR1740" n="1" subtype="column" type="textpart"><ab>**POSSIBLE ERROR**
    <lb n="1"/>Δημήτρειος Hallo <gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="2"/>ἀπέχω τῶν α<gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="3"/>ἔργων ἕως <gap quantity="1" reason="illegible" unit="character"/><gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="4"/>ἔχω δὲ παρὰ <unclear>Σω</unclear><gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="5"/>εἰς τὸ <num value="17">ιζ</num> <expan><ex>ἔτος</ex></expan> Θω<supplied reason="lost">ῦθ </supplied><gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="6"/>διὰ χειρὸς τά<supplied reason="lost">λαντον</supplied> 

    <lb n="7"/>ἓν <num value="1"/> δραχμὰς <gap extent="unknown" reason="lost" unit="character"/> 

    <lb n="8"/><choice><reg>χιλίας</reg><orig>χειλίας</orig></choice> <num value="1000"/> τρεα<gap extent="unknown" reason="lost" unit="character"/> </ab></div></div>

Bildschirmfoto 2020-07-24 um 11 35 04

Edelweiss commented 4 years ago

I couldn’t find any responsible passages for the vanishing of the @xml:ids in the xslt processing. Perhaps it has something to do with the xsugar parsing. I’m still tracking down the error. Need to do some more tests to understand the issue and the behaviour of the editor.

HolgerEssler commented 4 years ago

I should have specified that the error does not occur in the xml-editor, but in the metadata editor mask. I am not sure, which text was used above, but the example of TM 62400 as it is currently online does open and save in the Leiden+ editor. I have created a copy inserting "TestError" in the very first line of the text as http://papyri.info/editor/publications/90225/dclp_text_identifiers/205566/edit: Screenshot_62400 It might be relevant that the @xml:id is not displayed in the metadata editor mask, but only the content of the idno element.

Edelweiss commented 4 years ago

Then the behaviour described above takes me by no surprise. Herculaneum files were – as far as I recall – never promised to be processed by the meta editor correctly if they contained additional xml mark-up beyond the basic features already provided by the HGV editor or the desiderata introduced by the DCLP project.

As for the collection ids, DCLP uses the standard HGV editor component, which is agnostic towards xml:ids. Consequently, the DCLP meta editor currently doesn’t support adorning TEI:idno tags with xml:id attributes (which is why the xml:id doesn’t show up in the mask even if it’s present in the xml):

Bildschirmfoto 2020-09-02 um 15 14 03

https://github.com/papyri/sosol/blob/master/app/views/dclp_meta_identifiers/_object.haml#L42