usfm-bible / tcdocs

Technical Committee Documents
Other
9 stars 9 forks source link

Proposal: Add new milestone style type to represent an anchor point or range for external data #60

Open lyonsil opened 5 months ago

lyonsil commented 5 months ago

The main use case I have in mind for this proposal is to provide a clear point or selected range of text for someone to attach a comment to some text. Current versions of Paratext simply record text before and after the intended target (e.g., if the word "gamma" is the target inside the text "alpha beta gamma delta epsilon", we could record "alpha beta" as before and "delta epsilon" as after), but that is fragile since and changes to text around the target can cause us to no longer be able to identify the target itself. Custom milestones could be used for this, too, but given that we already have a milestone type that is purely for purposes of helping translation teams and not to be printed (i.e., the ts style type), adding another seems like a reasonable proposal.

Presumably there are others who want to tie metadata to particular phrases in scripture that don't align to verse boundaries, so I don't see this as purely a question about commenting. It's more of a question of how the standard should support this sort of tying/anchoring to data outside of the scripture data.

When looking at what word processing file formats specify for how comments are targeted, I found the following.

The OpenDocument file format specification appears to embed annotations at a given point in a document to denote where an annotation applies. Text within the annotation element is the referenced text.

The Open Office XML (OOXML) format that was proposed by Microsoft and used in Word seems to use IDs embedded in the text. Comments are stored in a separate comments XML file and linked to the main document using the ID tags.

KentSpiel commented 5 months ago

I do not think this is something within the scope of the USFM/USX standard. It is not a complete word processing solution like Word or OO. The Paratext implementation of comments is particular to that platform. For example only being able to anchor to text is not in the USFM/USX specification. Another implementation could allow that. Likewise, PT comments are not intended to be exported to other applications. @jonathanrobie may have another perspective as this sounds like a Scripture Burrito issue.

lyonsil commented 5 months ago

Thanks, Kent. To be clear I'm not suggesting that the way Paratext (or any other software) stores comments themselves should be part of the standard. I'm only looking for a clearer way to provide anchor points within a USX document that is actively being edited by multiple people in a team. I thought the OpenDocument and OOXML comparisons are helpful from that perspective.

Being able to embed anchor points within other data types (e.g., images, audio, etc.) is interesting to consider, but I think that's also outside the scope of the USFM/USX standard, so I wouldn't suggest anything about that here. I'm just thinking about how to provide simple, consistent anchors within USFM/USX.

jwickberg commented 5 months ago

The main reason that comments are stored separately from the text is that this allows people who don't have write access to the text to still make comments on it. Anchors in the text could only be added by those with write permission to the text.

You would then also have to handle merging changes made by adding annotations with changes that made changes to the text. In Paratext, we currently don't automatically merge changes to the same verse, but this would probably be annoying in this case where one change wasn't really a content change.

Paratext does have code to try to adjust the location of a comment when the verse text changes. If a new best location can't be found, the comment will move to the front of the verse.

jonathanrobie commented 5 months ago

@jonathanrobie may have another perspective as this sounds like a Scripture Burrito issue.

References to Scripture Text are easy to support using USFM references. Paratext notes, Enhanced Resources, and a bunch of other things rely on this ability.

Anchors into other things depend a lot on what those other things are. Images, lexemes (dictionary headwords), sections of books like language grammars, images, videos, audio ... each of these has its own link system. Scripture Burrito has two groups working on this: (1) Scripture Audio is now working to define links into audio and video, (2) Scripture Alignment is defining alignments among translations or source texts. But USFM / USX is not not where these things are being defined. We probably do want to make sure that USFM/USX can contain these kinds of references.

USFM / USX also does not define the members of a collection, which you might also want. For instance, Scripture Burrito's Scripture Text contains the metadata that describes all the USFM / USX files for a translation.

lyonsil commented 5 months ago

@jwickberg That is a fair point, but it is implementation specific which is separate from the standards question.

I don't think the actual implementation being considered will be that bad for the read-only case. Permissions are only applied (at least for tools we're thinking about) on the client side, and there are plenty of ways to approach authorization implementations. The merging question is fair to consider further, but again is separate from the standards question. I think the standards question is about the data model itself.

Right now the standard's data model has embedded text for some kinds of annotations (e.g., footnotes) and some kinds of translator markings (e.g., ts milestones). I'm proposing something more generic is added to help with other translator cases, not a one off for another case.

My concern with anything completely external (like PT9 project notes) is that every one of those situations requires custom logic for how to find the target when the underlying text is still being edited. In the P.B case, we're explicitly trying to provide an environment for others to write code that works with projects. That means it isn't just the case that P.B would need to reimplement "anchor adjustment" code for notes, but everyone who wants to do identify text in a changing file would need to do the same. Anchors that move with the text avoid this complexity.

KentSpiel commented 3 months ago

@lyonsil Just musing but aside from the clutter, anchors that are added can also be removed. A comments system that relies on anchors risks being undone if the anchors are removed by another user or system. Using references seem to me to be a cleaner, safer option. For a specific project, a custom anchor milestone could be defined. If it proves popular we might implement it in the standard in a later version. I recommend making this a possible Future Feature.