Closed gbnewby closed 5 months ago
Because title ending periods are customary in library catalog records, trailing periods in the title are removed. I suggest that this should be fixed in the catalog by replacing the trailing period with .
unicode FF0E "fullwidth full stop" or ․
unicode 2024 "one dot leader". Or possibly there is a library catalog convention for this case. Should let the cataloguer decide.
It seems this needs further discussion. I checked with the catalog team, and they confirmed that periods (and ellipses) in title and subtitle fields in the PG catalog database are significant and should be included in the landing pages & in-book metadata.
They did not think it was a good idea to use a character that is not a period, but looks like one, in lieu of an actual period.
Here are a couple more problematic titles:
https://www.gutenberg.org/cache/epub/33314/pg33314-images.html https://www.gutenberg.org/cache/epub/60671/pg60671-images.html
792 instances were found using psql-> select pk, title from books where title LIKE '%.';
Eyeballing them indicates that nearly all instances of punctuation should be displayed.
Thanks.
Thanks for taking care of this. Let me know if there is more you need from me.
On Wed, May 8, 2024 at 8:31 AM Eric Hellman @.***> wrote:
It's an easy change:
— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/libgutenberg/issues/42#issuecomment-2100851778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLRSW5MP7SDRD6ZI3VLZBJAOZAVCNFSM6AAAAABHG5AWLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQHA2TCNZXHA . You are receiving this because you authored the thread.Message ID: @.***>
It looks like this is happening in generated files, as well as landing pages, so I'm guessing the issue is in libgutenberg rather than in ebookmaker or autocat3.
The reported issue is that punctuation was incorrectly removed from a catalog title.
In the catalog (correctly, including the period after "Inc.": Edit Delete 245 - Title Statement 4 The girl from Bodies, Inc.
The title line from https://www.gutenberg.org/cache/epub/73523/pg73523-images.html The Project Gutenberg eBook of The girl from Bodies, Inc (loses the trailing period on the abbreviation)
The Title line: Title: The girl from Bodies, Inc
The START OF line: START OF THE PROJECT GUTENBERG EBOOK THE GIRL FROM BODIES, INC
Summary: Catalog is correct. Generated HTML is not. Netiher is the generated UTF-8 text.