standardebooks / tools

The Standard Ebooks toolset for producing our ebook files.
Other
1.43k stars 126 forks source link

Modernize aërial -> aerial #762

Closed robinwhittleton closed 1 week ago

robinwhittleton commented 1 week ago

There‘s an obvious trend in ngrams for this: https://books.google.com/ngrams/graph?content=a%C3%ABrial%2Caerial&year_start=1800&year_end=2022&corpus=en&smoothing=3

Looking at the corpus, 19 productions currently use this; will modernize if this is merged.

acabal commented 1 week ago

OK, thanks!

robinwhittleton commented 1 week ago

Oh, damn, looks like I branched off the wrong branch for this. Are you OK with the y-024 change, or do you want me rework this? Sorry!

acabal commented 1 week ago

Yes, it's fine, I assume you checked against the corpus already to make sure there's not too many false positives

robinwhittleton commented 1 week ago

No false positives that I’ve seen, apart from Ulysses (surprise surprise).

robinwhittleton commented 1 week ago

Oh, and Ozma of Oz, where Tiktok speaks like this:

I can-not help be-ing your in-fer-i-or for I am a mere ma-chine.

But potentially those should be non-breaking hyphens anyway?

acabal commented 1 week ago

Possibly but I think all of his dialogue is like that and it would be pretty hard to find all the cases, and then we could wind up with words not being broken across lines correctly if he has many sentences like that.