Add paragraph breaks from originals

mrgreekgeek commented 1 year ago

Hello, and thank you to @sleeptillseven and @jmrog (and all the other volunteers) for all the work that has been put into this project! I recently used this digitized and corrected text to create a modern reprint (typeset in LaTeX) and it turned out great!

Unfortunately, it doesn't look like the paragraphing has been preserved from the scans, and the later chapters especially are just big walls of text that are not very easy to read. I would like to volunteer to add them in, but I wanted to check in first and ask what is the best way to do so? Shall we add a ¶ mark in the proper places, or add an actual line break? 🤷‍♂️ Let me know what is best, and I'll be glad to help out with that.

If there's any interest in adding a .tex file to this repo, I'd be glad to clean up my file and contribute it as well.

jmrog commented 1 year ago

Hi, @mrgreekgeek. I'm happy to hear that you did this; one of my main goals in getting the full text corrected was to make modern reprints workable! For my part (although I'm not an official maintainer of the project here), I'd love to have the .tex file in the repo.

Your question about paragraph breaks inside of sections/chapters is a good one. I looked at some of the other "Greek Learner Texts" projects to see whether they'd standardized anything related to this, and it seems like the answer is "no." (The closest one is the Salamis in Easy Attic Greek project, which simply numbers paragraphs rather than sentences.)

For the text file that is the basis of the output, I'm somewhat in favor of simple ASCII (if it's workable), assuming we use some kind of sentinel character. As far as I know, no new paragraphs begin in the middle of anything marked as a "verse" in that file. So, what if we just mark verses that begin new paragraphs at the front, with something like "P. "? For example (from the final section/chapter):

185.1 Ὡς οὖν κατέβη...
185.2 P. Ὁ δὲ κύριος προσετίθει τῇ ἐκλλησίᾳ...

We'd then have to modify the build script to check for that character and output accordingly, e.g. (just a sloppy suggestion here to make the point):

verse = ("<br />&nbsp;&nbsp;&nbsp;" + verse[3:]) if verse.startswith("P. ") else verse

That would work for generating the HTML with visible paragraph breaks. I'm not sure it'd work for you in terms of the LaTeX; it'll depend some on how you're generating that (assuming you're generating it).

jmrog commented 1 year ago

Related to my comment above, I wanted to note a separate thing that I hadn't noticed until now:

In the early chapters, every sentence is marked as a separate "verse." In the the later chapters, paragraphs rather than sentences are marked as verses. I'm happy to correct the text file in either direction, but I'm not sure which was intended. I assume (based almost purely on the word "verse" itself) that it was intended for sentences to be marked rather than whole paragraphs. @sleeptillseven can you confirm?

Also, what are the chances I can be made a full-on contributor (as in, able to approve/merge PRs) to this repo? (I see that you are marked as "busy" on GitHub.)

mrgreekgeek commented 1 year ago

Thanks for the quick reply, @jmrog! Shall I go ahead and start working on adding P. marks to the file? We can work out the details of how to merge the changes later I guess. (And if sleeptillseven wants to do something else, it would be pretty easy to do a Find/Replace or something. :)

I'll wait to release the .tex file till it can be properly paragraphed. I was doing it in a hurry, and didn't have a chance to add the paragraphs into it, so it will need redone anyhow. Yeah, it will be generated, although for my first attempt I actually just used the HTML and ran it through pandoc to get some basic tex which I then customized.

mrgreekgeek commented 1 year ago

Hmm... now that I look at the plain text source, I wonder if my original issue is even an issue. :) It looks to me like each numbered "verse" corresponds to a "paragraph" in the original source. ("Paragraph" meaning that the printed original has it starting with an indent). So I've been reworking my LaTeX code and creating the resultant .tex file via script so that it can be automated in the future. I'm neither a good coder nor very familiar with LaTeX, so I'm sure my code could be greatly improved, but this is what I've got. convert.py

The various titles in the text are causing me problems. When I did my first LaTeX job I just manually fixed them up how I wanted them, but I can't do that with a script. I wonder if we can be a little more explicit in the text about what type of title each one is? Not all of them are equal, but they currently look like they're all equal. Maybe the code just needs to be a lot "smarter" and if so, I'd welcome feedback on how to make it better.

sleeptillseven / stoffel-an-epitome-of-the-new-testament

Add paragraph breaks from originals #27