open-editions / corpus-joyce-ulysses-tei

James Joyce's novel Ulysses in TEI XML. Work-in-progress.
20 stars 17 forks source link

Pr/36 #38

Closed yellwork closed 7 years ago

yellwork commented 7 years ago

Great catches on the foreign. I added a rend=none where appropriate. Lines 14.1024, 14.1303 and 14.1483 should be unchanged.

yellwork commented 7 years ago

Lines 14.1024, 14.1303 and 14.1483 should be unchanged.

Unless I’m missing something? What are invisible non-breaking space ASCII characters? Though it isn’t the convention now, 14.1303 should read “F. K. Q. C. P. I.”, for example.

Thanks for your edits!

charlesreid1 commented 7 years ago

My apologies, I didn't mean to change any text. A regular space is ASCII character 32, while the non-breaking space is ASCII character 255. Non-breaking spaces are not always rendered correctly by text editors or handled gracefully when processing text, so I was trying to replace them with regular spaces. I am seeing most of them occur in abbreviations, e.g., H. J. O'Neill contains a non-breaking space between H. and J., but not after J. - not sure if the use of these characters is intentional or not.

charlesreid1 commented 7 years ago

Will avoid updating non-breaking spaces in future unless they cause an issue. Thanks for cleaning this up!

yellwork commented 7 years ago

Intriguing. Sorry, I had no idea they were in the data. Thanks for pointing this out! They seem to occur right before ellipses and around free-standing initials or numbers.

I’ve just done a global replace of non-breaking spaces with regular spaces. Would you mind telling me if you’ve still seeing ’em? Thanks again.