sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.61k stars 97 forks source link

Relase v1 criteria discussion #1398

Open alerque opened 2 years ago

alerque commented 2 years ago

A long time ago we made the semi arbitrary decision that a v1 tag would happen when a Bible could be reasonably typeset without intervention. We have a milestone for this, but I wanted to assemble an issue with discussion about what should or shouldn't be a blocker for this rather than discussion it across all the different potential issues. What follows are my ideas for this off the top of my head and I'll edit in any relevant discussion that we agree on:


(I'm only listing things that aren't done as of this issue discussion, many Bible prerequisites like grid typesetting are already functional).


I guess we weed a target source text, preferably something open licensed and reasonable Orthodox. Also so we don't fall into the trap of optimizing for one source rather than a general purpose solution, I propose we use two existing sources. Perhaps a KJV and an NET or something like that.

alerque commented 2 years ago

Just to put my thinking on the table, tagging v1 is the one time I'd like a major release to be pretty much a "nothing-burger" release with no breaking changes.

I'm happy to increment our current shifted major version as many times as we need to for breaking changes, but the v0v1 tag should only be considered a breaking change for two reasons:

  1. because the semver scheme changes from a pre-release x.major.minor with no patch releases to a major.minor.patch.
  2. because we finalize the removal of previously deprecated / breaking changes and remove any relevant shims.

We shouldn't have a lot of issues filed under this milestone, and none of them should be new API breaking ones. We want to test and approve all the major parts in releases prior to this tag. I think we have all the pieces anyway so this isn't a big deal. I'm pretty sure none of the things on this checklist require a breaking change by definition to implement (although of course we may change our mind about existing implementations while getting there).

RobH123 commented 2 years ago

Have a input driver for at least one standard source format (or easy reliable conversion). USFX is likely candidate here, but OSIS or USFM are not out of the question and there are other that could be considered.

I think you mean USX (from UBS/SIL, and which form part of their (rather locked down DigitalBibleLibrary bundles, but there are some open ones here), not the lesser used USFX (from eBible.org).

I guess we weed a target source text, preferably something open licensed and reasonable Orthodox. Also so we don't fall into the trap of optimizing for one source rather than a general purpose solution, I propose we use two existing sources. Perhaps a KJV and an NET or something like that.

There are many open-licenced Bibles available in USFM and other formats at https://ebible.org/find/. (See a typical details/downloads page at https://ebible.org/find/details.php?id=englsv .)

alerque commented 2 years ago

Yes, USX was what I had in mind, I was typing to fast while shuffling milestones instead of double checking what I was writing ;-) That being said USFX should be considered. Also there is quite a lot of overlap between them (and OSIS), and it might be worth writing the input driver and class to handle them all at once.

Omikhleia commented 2 years ago

Book headers and intros that span columns / Option for main content in balanced columns

You could relax this constraint. There are "reasonably typeset" Bibles that are single-column.

Omikhleia commented 2 years ago

So let's assume single-column for a start... What are you folks waiting for?

A bit tongue-in-cheek, browsing through the Scriptures, I found that interesting saying: image

BTW -- Source: Louis Segond 1910 Bible (... I am French...). Input format: USFX format from eBible (...all I could find easily, and why not after all - I've done far more complex XML things with SILE already...). Processing: SILE 0.12.5 + my existing packages + 20mn quick'n dirty hacks on a "preamble" -- leading to 1100+ pages in PDF, ignoring/skipping some content (i.e. introductory notes, table of contents and other editorial material not strictly part of the scriptures). Note: The same website has a PDF done with LaTeX (likely XeTeX since they have that in their source code version). Doesn't look that great, though... Surely we could do better...

Much needed up to that point: bottom stacking margin notes (a.k.a. sidenotes with relaxed constraint, minimally). Doesn't sound complex... Then on top of it, "x" ("xt, xo,") elements must be collated.

RobH123 commented 2 years ago

Book headers and intros that span columns / Option for main content in balanced columns

You could relax this constraint. There are "reasonably typeset" Bibles that are single-column.

Ha, yeah, good question @Omikhleia, but I suspect that was a rare find, and it turns out that even that has two columns. (Bible text is indeed one column, and references are in another.)

But I think the long and short of Bible printing is that two columns for the actual Bible text is the normal expectation. Introductions probably one column at the top of a new page, then a ruler, then the balanced columns. Text on the back of the page must match vertically because of being visible through the typical thin Bible paper. (Sorry, don't know the technical terms for any of this.) Both cross-references and footnotes (sometimes mixed, mostly separated and with different caller methods/sequence and often with different formats) at the bottom of the page. Last page of book balanced. Drop caps style chapter numbers. Verse 1 number suppressed.

If you want to see the XeTeX alternative, look at https://software.sil.org/ptxprint (source at https://github.com/sillsdev/ptx2pdf) and https://vimeo.com/showcase/9331905.

RobH123 commented 2 years ago

Input format: USFX format from eBible (...all I could find easily, and why not after all - I've done far more complex XML things with SILE already...)

@Omikhleia Sadly I pointed you to a site that doesn't have USX -- the most common XML Bible format. (USFX is pretty much eBible only. USX is a widespread standard.) Try one from https://app.thedigitalbiblelibrary.org/entries/open_access_entries?type=text

BTW, as mostly a lurker and a dreamer and a previous dabbler I admire those of you able to work this kind of magic!

Omikhleia commented 2 years ago

Ha, yeah, good question @Omikhleia, but I suspect that was a rare find, and it turns out that even that has two columns. (Bible text is indeed one column, and references are in another.)

Yep, but technically the main "text flow" is on one column, which is the key point here. SILE currently cannot do properly multicolumns (not even mentioning column-balancing)...

Big problems have to be split in smaller problems, at some point. The question at stakes in this ticket is how to achieve SILE 1.0. So once said @simoncozens in TUGboat, Volume 38 (2017): "SILE will see a 1.0.0 release when it is capable of taking a Unified Scripture XML Bible translation and an appropriate class file, and producing a print-ready Bible of quality equivalent to that of a human typesetter". That was 5 years ago, let's reconsider this "semi arbitrary decision":

Sadly I pointed you to a site that doesn't have USX -- the most common XML Bible format. (USFX is pretty much eBible only. USX is a widespread standard.) Try one from https://app.thedigitalbiblelibrary.org/entries/open_access_entries?type=text

Sadly enough, I wasn't able to find a USX-encoded Bible in French there. Do you have a pointer for one?

This being said, it seems to me that Paratext can convert USX to USFM, and Haiola can convert USFM to USFX. Is that correct? Because if so, the format question is probably moot, and one can start actual typesetting in any of these formats at convenience. And build, on the way, the missing blocks (e.g. sidenodes), etc.

BTW, since this also is a discussion on v1 criteria, I could add these:

RobH123 commented 2 years ago

Yes, I agree on the step-by-step, but maybe only the owner(s) of the target can modify it??? To me, it wouldn't have achieved Bible typesetting if it was unable to flow text into two even columns for a start. (Extra couple of stars if it could do a diglot with different languages/fonts/font sizes/column-widths aligned vertically in those two columns. Several extra stars if it could do three columns with an interlinear text in one of them.) But I'm mostly just an observer here, so my vote doesn't necessarily count a lot.

Another place for USX files is at https://open.bible/resources/?sort=languageName,asc, but sadly still see no French. (And my own USX export seems broken -- I'll see if I can fix it in the next day or two and do a French Bible for you.)

simoncozens commented 2 years ago

I’m away this week and will weigh in more heavily when I come back but for me double column is key. It’s not just that single column is kind of unimpressive, but multiple columns also implies a bunch of other technical requirements for the output to look good.

alerque commented 2 years ago

Thanks for the discussion guys! This is highly motivating to get this finished up.

I'm going to agree with Simon here that clean balanced columns are kind of a key milestone here. I personally prefer Bibles with a single column and read from them every day and print my own draft text for translation projects that way — but the dual column approach is the defacto approach to Bible printing largely because of the page count savings it offers. I don't think we can pretend to be a viable alternative if we don't handle that case gracefully. The sample single column KJV @Omikhleia pointed out above is beautiful and a my hats off to whoever did the typesetting, but It doesn't even qualify as single-column from a typesetting perspective. The sidebar column being used for insertions is actually harder to get right than a basic balanced two column layout. I'm confident that if we got to the point where that layout worked out of the box without lots of intervention we would be at our other milestone goals anyway.

As @RobH123 pointed out I'm also adding grid typesetting to the list, not because we don't have it working already but because we have to make sure it plays nice with balanced columns, which at the moment it does not.

@RobH123 Some of your other points are addressed already. Dropcaps are (thanks to @Omikhleia actually) working fantastic (see the example on on this page). Mixed insertion types is already on my list as a requirement. Verse 1 number suppressed is already trivial (and I have an example in a project already, we can just add it to the default SILE bible class). I've also done hanging verse references, so we have that to work with. Your other comments about columns, introductions, last pages, etc. I believe are all covered in the existing criteria.

I'm open to looking at the input format scene again. I would like to target the formats that lend themselves to open-licensed projects more. USX in a it problematic in that regard, USFX and OSIS might give an edge. But again I don't think that will be a difficult part of this milestone, supporting all 3 or more might be easy enough. (And I say that having already written working USFM→SILE conversion for my own projects so I'm not a stranger to all the format gotchas and even incompatibility between output from different tools using the same format).

Diglot / trigot / interlinear stuff is actually high on my personal priority chart, but naturally fits in a 1.x milestone after 1.0 is out the door. If we happen to chip away at related issues great, but I'm not pushing 1.0 back specifically to cover those issues.

alerque commented 2 years ago

Re open issues: I'm not interested in setting a number on open issues / PRs as a target release criteria. We use them in so many different ways I don't think it is a good number to use as a hard metric (e.g. "Get it under 50" or "to x ratio"). Some are serious bugs, some are pipe-dream feature requests, some are just discussion. I agree the open PR ratio can be a useful thing to look at, but lots of the open PRs are my own drafts and again some are minor, some are major, etc. I think evaluating a FOSS project's health does involve seeing how responsive they are to PRs (do they get any feedback? Do at least a majority of them get merged eventually? etc.) but at least for my own working style a target of less than X open PRs is not a good metric for defining when to tag a version.

This is why I group issues by target milestone... blockers and breaking changes always have major target versions associated with them, minor improvements and new features have minor target versions. Minor versions (currently the patch release number, 0.0.X) can get bumped depending on release cadence and urgency, but major versions (currently minor release number, 0.X.0) are much more deliberately planned what needs to land in the version before release.

RobH123 commented 2 years ago

I'm open to looking at the input format scene again. I would like to target the formats that lend themselves to open-licensed projects more. USX in a it problematic in that regard, USFX and OSIS might give an edge. But again I don't think that will be a difficult part of this milestone, supporting all 3 or more might be easy enough. (And I say that having already written working USFM→SILE conversion for my own projects so I'm not a stranger to all the format gotchas and even incompatibility between output from different tools using the same format).

Just to be sure that you all know, USX is part of the new ScriptureBurrito packaging about to reach v1.0 for Bible-related interchange. See Scripture Text flavour where both USFM and USX are mentioned -- the former more for in-progress work, and USX for completed/published work in general.

RobH123 commented 1 year ago

Ok, not quite a year later, but I have at last created a BibleTypesetter repo with sample USX files of open-licensed Bible versions in order to move this issue forward. (You're also welcome to put code directly into that repo if that's convenient -- just let me know so I can grant access.)

It's the trend in modern Bibles (e.g., see WEB and others from eBible.org) to put lots of tagging (lemmas, morphology, Strongs numbers tagged to each word) in the source files which increases the XML complexity (beyond footnotes and cross-references which have been around in Bible files for much longer). Even more complex are aligned translations (like the ULT), which use custom USFM milestones (which seem to have been converted by Paratext to non-standard USX paragraphs) to map translated words (or multiple words) to original language (Hebrew/Greek) words (or multiple words). Most/All of that tagging is not required for paper versions (so maybe it wouldn't be unreasonable to strip it out in a preprocessing stage). For simplicity's sake, I included a few books of my in-progress OET-RV which still only have the basic text at this stage and so probably an easier place to start.

As mentioned in a ReadMe, traditionally folders of translated Bibles only contained files for the translated "books" -- the title page, etc., were all created by hand by the professional typesetter, so sadly this information is often missing from the archived translations. However, I'll try to work with anyone to create these, if you want to suggest a format. (The OSIS Bibles which are single files for an entire Bible like the ones here might give some ideas???)

If someone most familiar with SILE (or with more time to invest than I can give at present) got a basic system working, possibly I'd be able to understand it enough to continue to maintain and iteratively improve it (although I'm a bit scared of how much SILE syntax still seems to be in flux)?

Omikhleia commented 10 months ago

If I correctly set up the shared link, here is a full Segond Bible (LSG) converted from @RobH123 's USX files mentioned just above: https://drive.google.com/file/d/1oQY2aPUvbAAxlLtfDB19NmnPRBnL_U6h/view?usp=sharing

There's a workaround for spacing issues in the USX source (https://github.com/Freely-Given-org/BibleTypesetter/issues/1), but I haven't checked if its general enough.

If someone most familiar with SILE (or with more time to invest than I can give at present) got a basic system working, possibly I'd be able to understand it enough to continue to maintain and iteratively improve it.

I am cleaning up the code - it should be available for the soon coming resilient 2.1 milestone... But I am not sure what to expect. Surely, there would be some more work to do... Maybe we can arrange something, if the proposal is not just theoretical?

Omikhleia commented 10 months ago

@RobH123 and other interested: My setup for the LSG is here, along with instructions: https://github.com/Omikhleia/awesome-sile-books

Omikhleia commented 7 months ago

@RobH123 and other interested: My setup for the LSG is here, along with instructions: https://github.com/Omikhleia/awesome-sile-books

Just to be sure that you all know, moving forward: https://github.com/Freely-Given-org/BibleTypesetter/pull/3