Open gbnewby opened 8 months ago
I'd appreciate an annotated example.
As the START and END markers are important for downstream automated use, I don't want to wrap them.
I have no way to apply different formatting to "recent books".
Will implement first two suggestions if they're easy.
they were easy
Thanks for that.
I understand about not wrapping the sentinel lines. I'll probably forget in the future and ask again - apologies in advance.
For the credits, how about reflowing what's in the 508 field rather than blindly rewrapping? Reflowing to 72-80 characters wide will fix any
In what situations will a user's text viewer not reflow the text unless desired?
remember that there are urls in the credits which can break on reflow.
Reflow: In the link I sent earlier, it was not reflowed in the viewer I used (Firefox). I don't see how a viewer would know to reflow to 70-80 characters like the rest of the file.
The anomalous appearance reported, which I confirmed, is that the whole file comes with line lengths as expected (i.e., 70-80 characters more or less), EXCEPT for the sentinel lines and credits.
For URLs: I was thinking that reflowing would only insert line breaks on
whitespace. Not stuff like punctuation. I.e., like the Unix "fold -s"
command or /\s+/ regular expression. A Unixy way to do this would be to put
the input through a sequence like sed 's/\s+/ /' | sed 's/\n/ /' | fold -s
but slightly more intelligently to handle different line endings.
On Fri, Mar 1, 2024 at 10:19 AM Eric Hellman @.***> wrote:
remember that there are urls in the credits which can break on reflow.
— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/220#issuecomment-1973580801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLQLH4ZJOLJTB2OLFHLYWC2CDAVCNFSM6AAAAABEA3WPU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTGU4DAOBQGE . You are receiving this because you authored the thread.Message ID: @.***>
post the link, please.
Why would someone use firefox for plain text?
My view is that messing with linebreaks in metadata inherently lowers its value for consumers of metadata. There needs to be other value added that outweighs the text pollution.
post the link, please.
Here's one. The behavior seems consistent with what I originally reported in this issue: https://www.gutenberg.org/cache/epub/70889/pg70889.txt
As with all recent books, the 508 field doesn't have any
Rewrapping older 508 catalog entries that have embedded
Why would someone use firefox for plain text?
That's not really the point. The point is that the whole entire book is wrapped at 70-80 characters, per PG's usual practice.
Except for the credits and sentinel line.
I understand the logic in not wrapping the sentinel line. I think wrapping the credit line is desirable so it's aligned with the margins in the rest of the book.
Reflowing within the viewer isn't really an issue. If you shrink your viewing window smaller than the margins, things are going to look pretty ragged.
My view is that messing with linebreaks in metadata inherently lowers its value for consumers of metadata. There needs to be other value added that outweighs the text pollution.
We're not changing the metadata, we're changing the generated output.
However, one thing we could do is reflow all the 508 fields in the back
catalog to remove extraneous
I see a lot of value in consistency. Editing metadata is something we do frequently, and if we were to update the 508 fields it would be a gift to downstream consumers of greater consistency.
Message ID: @.***>
added the reflow for 508 to v13 todo list. Due to implementation details, it's not "easy".
Thanks.
On Fri, Mar 1, 2024 at 12:32 PM Eric Hellman @.***> wrote:
added the reflow for 508 to v13 todo list. Due to implementation details, it's not "easy".
— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/220#issuecomment-1973795665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLX22OY7EQEIMCD34RDYWDJTXAVCNFSM6AAAAABEA3WPU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTG44TKNRWGU . You are receiving this because you authored the thread.Message ID: @.***>
An alert reader noticed some anomalies in how generated .txt files are laid out:
For that last point, I know that some of our 508 fields have embedded CR/LF. But for more recent books, 508 should be just one line. So, wrapping would seem safe in those circumstances.
Thanks.