standardebooks / tools

The Standard Ebooks toolset for producing our ebook files.
Other
1.43k stars 127 forks source link

Modernize aërial -> aerial #762

Closed robinwhittleton closed 1 month ago

robinwhittleton commented 1 month ago

There‘s an obvious trend in ngrams for this: https://books.google.com/ngrams/graph?content=a%C3%ABrial%2Caerial&year_start=1800&year_end=2022&corpus=en&smoothing=3

Looking at the corpus, 19 productions currently use this; will modernize if this is merged.

acabal commented 1 month ago

OK, thanks!

robinwhittleton commented 1 month ago

Oh, damn, looks like I branched off the wrong branch for this. Are you OK with the y-024 change, or do you want me rework this? Sorry!

acabal commented 1 month ago

Yes, it's fine, I assume you checked against the corpus already to make sure there's not too many false positives

robinwhittleton commented 1 month ago

No false positives that I’ve seen, apart from Ulysses (surprise surprise).

robinwhittleton commented 1 month ago

Oh, and Ozma of Oz, where Tiktok speaks like this:

I can-not help be-ing your in-fer-i-or for I am a mere ma-chine.

But potentially those should be non-breaking hyphens anyway?

acabal commented 1 month ago

Possibly but I think all of his dialogue is like that and it would be pretty hard to find all the cases, and then we could wind up with words not being broken across lines correctly if he has many sentences like that.