Closed kavitharaju closed 1 year ago
This JSON isn't what I am proposing or anything. Just started with the very basic USX to JSON conversion to get the discussion started. There are a few suggestions I have on this structure to begin with...
<para>
, <char>
etc in USX has become the JSON keys here, which isn't really adding much value to a user. The key could be something else. One option is to use the USFM tag itself, another could be to use a type name that describes what this is element is. Eg.: header, introduction, title etc.style
. we could call it something like marker
or tag
@kavitharaju Do we have an example with \cp and/or \ca ? I was answering questions about this over the weekend. The USX way of doing this seems terrible to me, because this information is in completely different places depending whether or not the \cp occurs just after the \c.
Incidentally, and contra the current position of the USFM committee as I understand it, people do use \cp to structure documents, instead of \c, which is not a surprise since the majority of Christians alive think that the \c divisions are wrong! The weekend conversation involved lectionaries, and almost everyone who is interested in producing lectionaries is close to the Catholic or Orthodox traditions. If our JSON is just for lexing, we can probably ignore this, but we still need a consistent way to represent \cp, \ca, \vp and \va. In app-facing models I think we need to nail down the semantics and support operations like "everything within \cp 3b". I'm going to do that in Proskomma but it would be better to agree the semantics more widely, rather than proceeding via de facto standards.
As I understand it, the USX way of handling ca
and cp
, was to add them as altnumber
and pubnumber
attribute to the chapter
element. That will limit it to be used used only as per the \c
based versification structure and not allow a different chapter division for example in the middle of a chapter. I hope that is the concern you are raising, right?
In the new test cases in the USFM/X committee's repo I see a different way they handle this. Here they are treated as separate elements not attributes of chapter element. I hope this is a conscious change they are making ( and not the inconsistency you were talking about). @joelthe1, please correct me if I am wrong.
Since I collected our samples from that repo, our JSON output is also according to that new structure. You can view in in this commit
Have made a few tentative changes to the structure. They are up for discussion and can be reverted/changed if needed.
One issue I noticed in our script is that, whether an object have children or not is determined by the number of items it has in the input USX. That is, if a \p
just had one text object, it will be shown as an object without nesting/children. This means inconsistency. I am planning on keeping a list of objects for which we can expect to have nested contents and provide children attribute to them. Any thoughts on this?
Includes