vphill / corpus-gammels-laws-vol-01-tei

Gammel's Laws of Texas, Volume 1 TEI Project
GNU General Public License v3.0
0 stars 0 forks source link

How to handle end of line pagination #1

Open vphill opened 2 years ago

vphill commented 2 years ago

Decide and document how to deal with end of line hyphenation.

From the Documenting the American South project. "Any hyphens occurring in line breaks have been removed, and the trailing part of a word has been joined to the preceding line. " - https://docsouth.unc.edu/imls/texconst/texconst.html

kshawkin commented 2 years ago

See discussion at https://tei-c.org/extra/teiinlibraries/4.0.0/bptl-driver.html#index.xml-body.1_div.3_div.3_div.2

vphill commented 2 years ago

Just curious if this is a typo or me reading it incorrectly. In the third row of that table, the note doesn't seem to match the example. Mismatch between strong and weak.

Colloquial name Appearance in source document Encoding Note
Soft hyphen UTF-8 is a char- acter encoding for Unicode. UTF-8 is a char<pc force="strong">-</pc><lb break="yes"/>acter encoding for Unicode. As in the first example, the use of weak as the value of force indicates that the encoder considers "character" to be a single orthographic token where the hyphen is only indicating that the word is broken across a line. The use of no as the value of break also indicates that the line break occurs inside an orthographic token (single word) which is broken across a line.
kshawkin commented 2 years ago

Wow, that's a major editing error. The discussion of the value of @break also doesn't match. Since I don't remember what it's supposed to be, I've opened an issue: https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/issues/96

kshawkin commented 2 years ago

Ah, see reply at https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/issues/96#issuecomment-1176892600

vphill commented 2 years ago

So is the goal something that looks like this.

When a government has ceased to protect the lives, liberty and
property of the people, from whom its legitimate powers are de<pc force="weak">-</pc><lb break="no" />
rived, and for the advancement of whose happiness it was insti<pc force="weak">-</pc><lb break="no" />
tuted: and so far from being a guarantee for their inestimable and

Or something more like this.

When a government has ceased to protect the lives, liberty and
property of the people, from whom its legitimate powers are de<pc force="weak">-</pc><lb break="no" />rived,
and for the advancement of whose happiness it was insti<pc force="weak">-</pc><lb break="no" />tuted: 
and so far from being a guarantee for their inestimable and

Basically do you pull the remainder of the word from the following line or leave it as it is?

And I guess this would be another option based on more reading of the <lb />

When a government has ceased to protect the lives, liberty and
property of the people, from whom its legitimate powers are de<pc force="weak">-</pc>
<lb break="no" />rived, and for the advancement of whose happiness it was insti<pc force="weak">-</pc>
<lb break="no" />tuted: and so far from being a guarantee for their inestimable and
kshawkin commented 2 years ago

Those are all equivalent according to XML's rules about whitespace, which ignore line breaks (and multiple spaces). So you can create your XML in whatever ways helps with readability during creation, but you should always keep in mind that if you process your XML with XSLT, or use an XML-aware editor like oXygen, it might end screwing up your pretty formatting anyway.

vphill commented 2 years ago

So if I understand, that would also be the same as,

When a government has ceased to protect the lives, liberty and property of the people, from whom its legitimate powers are de<pc force="weak">-</pc><lb break="no" />rived, and for the advancement of whose happiness it was insti<pc force="weak">-</pc><lb break="no" />tuted: and so far from being a guarantee for their inestimable and

I guess the thing I hadn't been thinking about correctly with this so far is that if I am interested in preserving the lines that Gammel put on the page I will need to add the explicit <lb /> to the lines otherwise they shouldn't be assumed to be there just because they might show up in the text editor.

kshawkin commented 2 years ago

Exactly!