Open kavitharaju opened 7 months ago
@mhosken While working on this, I couldn't find the code portion corresponding to this function
https://github.com/usfm-bible/tcdocs/blob/9203ce01c89b410ff78b6a4683255ef655340480/python/scripts/usx2usj.py#L60C1-L67C23
in the python/lib
module. Is there a place where the root object of USJ is formed and version number is set there?
In response to the question of missing code in usjproc.py: guilty as charged. Sorry that got dropped accidentally. Do you want me to add it back in or do you want to do it? I would also refactor usx2usj to use usjproc rather than repeating code. In fact I would suggest refactoring the use case for usx2usj to use usfconv (which does any serialization to any other serialization) and do away with usx2usj completely.
Looking at this PR, I would suggest that this is not a good way to go regarding \vp and \va. Yes \vp is ambiguous in that it can occur as a way of tagging the published form of a verse and also it may occur as a simple character style. I would suggest that USX is stronger here and keep the information as attributes of the verse. This doesn't preclude also having character runs of type vp.
Another reason for not wanting to do this is that the simpler you can keep the mapping between USJ and USX, the better. Every special case is more expensive than a few lines of code, you have to document it and every implementation has to track that special case. It's why I work so hard to keep special cases out of the USFM parser/generator and keep it all in the grammar file.
If you still feel strongly that you do want to follow USFM here, you also need to write the corresponding code to parse the sequence in USJ back into attributes in the USX data model.
Our motivation for treating va
and vp
so, was to avoid "the special case" already present in USX in the way it keep it as attribute in one occasion and new object in another. Wouldn't that be expensive for a tool working on USJ independently of USX ?
I think you have a special case whichever way you approach it. The advantage of keeping the attributes is that you are closer to the content model and the 'other' case is also a normal case (just another character style). I.e. the model and conversion is simpler. If you go with the USFM model for these, you have the same pain that the USFM processing has of explicitly handling these during conversion.
I don't see a value in users of USJ having a single way to handle vp whether it is being a published verse or merely a character style. The two contexts are dissimilar enough to warrant separate handling. (Why do we allow vp as a character style anyway?)
Why do we allow vp as a character style anyway?
Because it is impossible to typeset many ecumenical Bibles without it.
Preface to Sirach, NFC:
\p \vp (1)\vp Les livres de la \w Loi\w et des \w Prophètes|Prophète, prophétesse, prophétie, prophétiser\w nous transmettent de nombreuses grandes leçons, \vp (2)\vp de même que les autres Écrits qui les suivent
There are 35 "verses" before chapter 1. How would you like to do this without a vp character style? Or, alternatively, are you going to hold the 1.0 spec pending a discussion with the Vatican?
Below, printed examples of French Bible Society NFC and TOL (the official French catholic translation).
Thanks for the examples. Don't worry there are no plans to do away with the char style. I was merely sharing my ignorance. BTW even if we did decide to do away with the vp char style WHICH WE ARE NOT, we would keep it supported, if deprecated, until it really isn't around any more. IOW, Don't Panic.
On Thu, 28 Mar 2024, 14:52 Mark Howe, @.***> wrote:
Preface to Sirach, NFC:
\p \vp (1)\vp Les livres de la \w Loi\w et des \w Prophètes|Prophète, prophétesse, prophétie, prophétiser\w nous transmettent de nombreuses grandes leçons, \vp (2)\vp de même que les autres Écrits qui les suivent
There are 35 "verses" before chapter 1. How would you like to do this without a vp character style? Or, alternatively, are you going to hold the 1.0 spec pending a discussion with the Vatican?
— Reply to this email directly, view it on GitHub https://github.com/usfm-bible/tcdocs/pull/66#issuecomment-2025421690, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLMO3O2QF6FSXRJLESO7ALY2QOB7AVCNFSM6AAAAABD3MN632VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRVGQZDCNRZGA . You are receiving this because you were mentioned.Message ID: @.***>
French TOB (major ecumenical Bible showing why you can't sidestep the issue by treating the prologue as a USFM introduction, cos there's an actual introduction. (And, also, the prologue is (deutero)canonical text, so putting it in an introduction is akin to mistaking \s for \d and also a recipe for annoying non-Protestants.)
IOW, Don't Panic.
Sorry, cross-posted comment.
I'm not panicking because, one way or another, everyone will keep doing the right thing with Bibles. The only risk is that they ignore any standard that makes that harder, or that they prefer any de facto standard that makes that easier.
If, today, you proposed to a room full of technicians some new standard with two completely different ways to represent exactly the same thing, the response would be somewhere between laughter and derision. That's precisely what USX does in this case. It's just one example of decisions with USX 3.0 in particular that, starting from scratch, would look like an obfuscated code joke. I get that it's hard to roll back those decisions for USX. But insisting on backwards compatibility with stupid, for eternity is... not guaranteed to drive adoption.
USX and USJ are both serializations of a single data model. If we need to change this in USJ, we should change it in USX at the same time. I do mot yet have a strong feeling whether we should make this change, but I feel strongly that we should either make the change in both USX and USJ or not make the change.
How does the TOB currently do this? Was the TOB written in Paratext? Was it published via USX? What does the markup look like in USFM / USX?
If I am understanding correctly, the concern is about the two different uses of \vp in USFM. I will call the first, the parameter usage and it is modeled in USX by a parameter. The other use I will call the character style use and it is modeled in USX by a
In USX the two uses are simple and clear: @pubnumber (even though it doesn't have to be a number) and
If we decided that we really didn't want to follow the USX model. Then we would need to change the USX model to use
And hence the request for USJ to directly represent the USX model rather than the USFM serialisation model.
I think it's ambitious (putting things politely) to call this issue "the USX model". It's an accident of XML syntax and of the Paratext internal processing model, before USX became a standard, a history which the committee claims to have put behind it.
I'm not seeing the two uses. In one case you are overriding the underlying versification and in the other case you are too. If I need to I can find you examples where both these forms happen in the same paragraph and for the same reason.
Last time we went around this, the conversation ended with use of vp to reorder partial verses in Zechariah, and with the committee's answer that Bible scholars needed to change their translation to fit the committee's markup. I still think that's not how things are supposed to work. vp is used in all sorts of ways in huge numbers of documents. You can't retrofit constrained semantics to the world's existing documents and expect those documents to still work as they did before. This is epistemology meets Tenet the film.
How does the TOB currently do this? Was the TOB written in Paratext? Was it published via USX? What does the markup look like in USFM / USX?
That was my own copy of TOB which I believe to be the most recent tradition. It certainly exists in Paratext, I'm not sure if it was translated or originated that way but given UBS involvement I would think that it was at least translated that way. I don't think it's in DBL so it probably hasn't been published in USX. I don't have access to the markup, isn't there someone from UBS on the committee?
I'm not seeing the two uses. In one case you are overriding the underlying versification and in the other case you are too.
From French NFC (UBS):
<chapter number="1" style="c" sid="SIR 1" />
<para style="ms1">PRÉFACE DU TRADUCTEUR GREC</para>
<para style="p">
<verse number="1a" style="v" pubnumber="(1)" sid="SIR 1:1a" /> Les livres de la <char style="w">Loi</char> et des <char style="w" lemma="Prophète">Prophètes</char> nous transmettent de nombreuses grandes leçons, <char style="vp">(2)</char> de même que les autres Écrits qui les suivent<note caller="+" style="f"><char style="fr" closed="false">PRÉFACE (1-2) </char><char style="fq" closed="false">Les livres de la Loi… les suivent </char><char style="ft" closed="false">: ou </char><char style="fqa" closed="false">La Loi, les Prophètes et les autres auteurs qui les ont suivis nous transmettent… </char><char style="ft" closed="false">– Le traducteur grec du <char style="bk">Siracide</char> mentionne ici les trois grandes parties de l'Ancien Testament hébreu ; voir </char><char style="em">La Bible, son unité, sa formation, son texte</char>.</note>. <char style="vp">(3)</char> Il faut donc féliciter le peuple d'Israël pour son instruction et sa sagesse. <char style="vp">(4)</char> Mais on ne doit pas seulement lire ces écrits pour devenir savant. <char style="vp">(5)</char> Ceux qui aiment s'instruire doivent être également capables d'en faire profiter les non-initiés, <char style="vp">(6)</char> et cela aussi bien par leurs paroles que par leurs écrits.</para>
@mhosken @jonathanrobie What different use cases are you seeing between
<verse number="1a" style="v" pubnumber="(1)" sid="SIR 1:1a" />
and
<char style="vp">(2)</char>
? What deep semantics am I missing here? In the first case we're printing a number in brackets and in the second case we are too. In the first case we also make 30 or so verses a whole partial verse, which is a horrible kludge of which the Bible tech world should repent but, regardless, on what logical basis does that kludge need to be syntactically connected to one of 30 or so places where we want to add a number in brackets?
And hence the request for USJ to directly represent the USX model rather than the USFM serialisation model.
I think it's ambitious (putting things politely) to call this issue "the USX model". It's an accident of XML syntax and of the Paratext internal processing model, before USX became a standard, a history which the committee claims to have put behind it.
Actually, we are creating a formal model of the language, something which did not exist previously. For the first time, we have:
That's something we care about, one of the main reasons we are doing this work in the first place.
We can change the USX representation if that's the right thing to do. I don't think it makes sense for USJ and USX to be gratuitously different. I think we would do well to focus on what the internal model should be for this USFM markup and reflect our answer in both the internal model and serialization to USX and USJ.
@mhosken @jonathanrobie What different use cases are you seeing between
<verse number="1a" style="v" pubnumber="(1)" sid="SIR 1:1a" />
and
<char style="vp">(2)</char>
This is what I care about most: USFM can express both of these things, so we have to give them each an interpretation in our model. USX and USJ should each follow that interpretation.
But I think there's a significant difference between:
In the first case we're printing a number in brackets and in the second case we are too.
The print formatting does not define the semantics of the underlying markup. I am not (yet) sure that I know whether anything needs changing in our model, but I would resist any change that was based on print formatting rather than well-defined semantics for each marker.
I think you are proposing a change to our semantics. Can you be more clear about what that change is?
Somewhat confused by @mvahowe's objections since I do not work in USX much or USJ at all. But the fact that \vp ...\vp* can be either a character style or an attribute on a verse does seem strange to me. Why can't vp always be an attribute on a
<verse number="" style="v" pubnumber="(1)" sid="SIR 1:0" />
and likewise for all the rest
<verse number="" style="v"pubnumber="(2)" sid="SIR 1:0" />
That said (and I am just speaking from what seems logical to me) I would not put the Prologue in Chapter 1. I feel it should be either explicitly or implicitly in Chapter 0.
<para style="ms1">PRÉFACE DU TRADUCTEUR GREC</para>
<para style="p"><verse number="" style="v" pubnumber="(1)" sid="SIR 0:0" /> Les livres de la <char style="w">Loi</char> et des <char style="w" lemma="Prophète">Prophètes</char> nous transmettent de nombreuses grandes leçons, <verse number="" style="v" pubnumber="(2)" sid="SIR 0:0" /> de même que les autres Écrits qui les suivent, . . . </para>
<chapter number="1" style="c" sid="SIR 1" />
The USFM would be:
\ms1 PRÉFACE DU TRADUCTEUR GREC \p \vp (1)\vp* Les livres de la \w Loi\w* et des \w Prophètes|Prophète\w* nous transmettent de nombreuses grandes leçons, \vp (2)\vp* de même que les autres Écrits qui les suivent, . . . \c 1
<chapter number="0" style="c" sid="SIR 1" />
`
@jonathanrobie
This is what I care about most: USFM can express both of these things
Which two things? In terms of output and in terms of any user-comprehensible semantics I can think of, the two things are
USX has two ways to describe exactly the same thing. There's no extra expressivity that I can see. If you marked up v1 with a character style it would mean exactly the same thing. Also, does the schema stop me from doing exactly that?
@KentSpiel Is \c 0
legal? It's an honest question. I'm almost certain that it wasn't a decade ago because Paratext expects chapters and verses to count up from 1. The more extreme case is Greek Esther where UBS translations often have chapter 1 before chapter 1 and two chapter 3s separated by chapter B. A common use case for \cp and \vp is printing creative versification while allowing Paratext to pretend that every Bible in the world looks a lot like KJV.
(There's an equivalent potential issue with v0, but that "just works" since English speakers care about this. So, in Psalms, you can have canonical text, typically canonical titles, before v1. Several deuterocanonical books need that functionality, but for chapters.)
No I don't think \c 0 is valid USFM. At least it would not work in Paratext, but that does not mean it couldn't be Valid. One would need to allow a chapter 0 in the project's versification. In other words it's a question of data integrity not structural integrity. That said, I don't think chapter 0 needs to be explicit. Like verse 0 it can be implied.
No I don't think \c 0 is valid USFM. At least it would not work in Paratext, but that does not mean it couldn't be Valid. One would need to allow a chapter 0 in the project's versification. In other words it's a question of data integrity not structural integrity. That said, I don't think chapter 0 needs to be explicit. Like verse 0 it can be implied.
We're way off the PR now, and I don't think there's an easy fix for the wider non-protestant versification issues. Chapter 0 probably should be "implied" since, like verse 0, no-one wants to print a zero in their Bible. The difference is that chapters contain verses, and many things break if you start typing verses before any chapter number. Off the top of my head you'd end up with all your ch0 content as part of mt1 or something.
Really, my only point here is that the USX way of representing the same vp information in different ways looks like an error, probably is an error, and therefore shouldn't be propagated into new standards such as USJ.
We're way off the PR now, and I don't think there's an easy fix for the wider non-protestant versification issues.
I agree that a PR is the wrong form for discussing this. Perhaps a shared doc would be better?
Really, my only point here is that the USX way of representing the same vp information in different ways looks like an error, probably is an error, and therefore shouldn't be propagated into new standards such as USJ.
If there is an error that we need to fix, I think we need to fix it in both USX and USJ. A pull request that changes just USJ does not do that.
But I think that starts with a clear shared understanding of the problem that needs to be fixed. I don't think we are there yet. I think a shared document would help:
If we agree there is a problem, we should find a solution to it. It may or may not be this one, but I think it should be the same for both USX and USJ.
Have sent a new PR with changes other than the vp related ones.
This PR could be kept as WIP until we make the required decision regarding it in USX( or the underlying data model).
I have created a shared document to help us understand the use cases and requirements that Mark and Kavitha have mentioned:
https://docs.google.com/document/d/1tBsihIxD8WBR6nFTmR9xPd98CepOPuK1U2b6leZgXiY/edit?usp=sharing
Can we discuss it there? I'm not convinced I understand the issues yet.
This PR includes
ca
,cp
,va
,vp
objects as separate elements in USJ like it is in USFM (https://github.com/usfm-bible/tcdocs/commit/866440e493491504649358bf87d9ea5e5cd6e2c9)python/lib/usjproc.py
as wellchar
andpara
type markers in USFM, though not so in USXrequired
in USJ schema