independent highlight - Githubissues

ousia commented 9 years ago

@juh2,

sorry, but I don’t understand this comment (I’m not so familiar with the texts from the DTA):

% I have to setup the highlighting patterns for each tag. This is bad.
% A mechanism to match highlighting patterns independently would be
% better.

Could you point me to a sample?

BTW, I think the DTA abuses the rendition attribute (or the <hi> element. A <hi> element inside a head, specified for the whole parent element is clearly wrong.

Their markup is purely visual, not logical. In my opinion they should encode the logical structure of the text, not their purely visual appearance.

juh2 commented 9 years ago

Here is an example.

<cit>
<quote>
<hi rendition="#et #aq">Si c&#x2019;est la raison, qui fait l&#x2019;homme,<lb/>
c&#x2019;est le sentiment, qui le conduit.
</hi>
</quote><lb/>
<bibl rendition="#right #g #k #aq">Rousseau</bibl>
</cit>

The first rendition has two values, the second four. The easiest thing would be if this would work:

\xmlsetsetup{#1}{contains(@rendition, '\letterhash right', '\letterhash g')]}{xml:right:g}

If this and that is in rendition then name it foo. I didn't tried this way.

Another way would be to have one-to-one relations between the xml attribute values and the ConTeXt constructs in a way that it is possible to concatenate them in the ConTeXt source. This would mean that we only have to xmlsetsetup them once and use them all together if needed.

Yes, it seems that they mix semantic and visual markup at least in this example. But their goal is to mark the text in a way that the digital version matches with the appearance of the scanned edition (mostly the first edition). If you want to make statistical analysis of the average line length you need the <lb/>, if you want to point to the time when people stopped to emphasize a passage with extended letters and began to use italic letters instead you need to have these renditions in your source.

They document their text body TEI here: http://www.deutschestextarchiv.de/doku/basisformat_table

Parent is: http://www.deutschestextarchiv.de/doku/

It is a huge TEI. I don't look at it, because I don't want to loose my enthusiasm. ;-)

ousia commented 9 years ago

The first rendition has two values, the second four. The easiest thing would be if this would work:
\xmlsetsetup{#1}
    {contains(@rendition, '\letterhash right', '\letterhash g')]}
    {xml:right:g}

If this and that is in rendition then name it foo. I didn't tried this way.

Not sure I understand the whole thing, but something like this should work:

\xmlsetsetup{#1}
    {[contains(@rendition,'\letterhash right') and
      contains(@rendition,'\letterhash g')]}
    {xml:right:g}

Wouldn’t it better better a modular matching approach? I mean, each rendition value has a xmlsetsetup of its own.

I think modular matching makes mixing easier. If I don’t get it wrong, with your approach above you’ll have to code any possible combinations.

And about the text encoding style from the DTA, I must confess that I dislike it. I don’t think it is totally illegitimate. But I’m afraid they’re loosing a good opportunity to encode texts with higher quality (I mean, something that could replace this edition).

If their TEI usage specification is huge, please never consult the P5 Guidelines themselve. They are really huge :smiley:

juh2 commented 9 years ago

Cool. Good to know that AND (and probably OR) works in setups. You are right, a modular approach would be better. I named it one-to-one solution, which is a confusing name. But I am not sure, whether we can concatenate attributes in ConTeXt like they do in the attributes.

I am not an expert for TEI so I can not judge their usage. But I know that they have problems to provide ebooks. I recently converted some texts to EPUB via Pandoc. After problems with the XHTML version I used the plain text version, because it was the easiest option.

ousia commented 9 years ago

Cool. Good to know that AND (and probably OR) works in setups. You are right, a modular approach would be better. I named it one-to-one solution, which is a confusing name. But I am not sure, whether we can concatenate attributes in ConTeXt like they do in the attributes.

or is already used in pandoc-xhtml.tex.

I am not an expert for TEI so I can not judge their usage. But I know that they have problems to provide ebooks. I recently converted some texts to EPUB via Pandoc. After problems with the XHTML version I used the plain text version, because it was the easiest option.

pandoc has no specific reader for TEI (only a writer is planned).

To generate ePub document from TEI sources, I’d rather use the tools from TEI, not pandoc.

juh2 / tei-style-dta-context

independent highlight #3