usfm-bible / tcdocs

Technical Committee Documents
Other
9 stars 9 forks source link

Proposal: Markup for non-vernacular words #49

Open davidg-sil opened 11 months ago

davidg-sil commented 11 months ago

[Moved here from old site]

While there is \tl that is for transliterated words intended to be pronounceable in the vernacular orthography. I would like to propose that there also be a \ol for "other language", not written in the vernacular orthography. I briefly considered calling it \wf (word foreign), but my use-case assumption is that at least some readers know the language, and may not consider it as foreign, but it's not the vernacular language of the publication. It might be in the majority language of the region, a trade language, an international language, or that of a neighbouring area or group.

Summary

Description

Other language (non-vernacalar) text, written in unaltered form, often one known and understood by at least a fraction of the target audience.

Notes

Syntax

davidg-sil commented 11 months ago

Commenting on my own suggestion, I realise that changing the font or hyphenation based on something that comes after the text is very hard in at least PTXprint. I don't know about other typesetting engines. Rather than being a character style, a ranged milestone would almost certainly be better.

Example:

\f + \fr 1:1 \fk Circumcised \ft A sign of the Abrahamic covenant.
 Romanian:\ol-s |lang="ro"\* tăiat împrejur \ol-e\* \f*

Also, a ranged milestone would allow the entirety of a majority language introduction to be marked up.

\ol-s|lang="en"\* 
\is Introduction to this translation
\ip ....
\ol-e\*
KentSpiel commented 7 months ago

Assuming we allow adding category markup to paragraph and character markers, this could be implemented simply by putting a category \cat ro\cat* on a Paragraph or a Character span. It would not be pretty in Paratext but could be useful in typesetting and other publishing processes.

\f + \fr 1:1 \fk Circumcised \ft A sign of the Abrahamic covenant.
 Romanian: \tl \cat ro\cat*tăiat împrejur\tl*\f*

Could we add category information to the Paratext Style sheets? For example in custom.sty:

\marker tl
\cat ro
\TextProperties publishable nonvernacular
\font Romanian Special
mhosken commented 7 months ago

The problem with this approach in a stylesheet is that you have, in effect, multiple records with the same key. That is a significant change for the tooling. It makes specifying the structure of stylesheets way more complicated. PTXprint gets around this using a structured Marker that is not valid USFM. See the technical manual for details.

I agree that a category value should be constrained to the normal id characters of lowercase, digits, hyphen or underscore. And yes I can buy into the value being a space separated list of category values.

On Mon, 19 Feb 2024, 19:16 Kent Spielmann, @.***> wrote:

Assuming we allow adding category markup to paragraph and character markers, this could be implemented simply by putting a category \cat ro\cat* on a Paragraph or a Character span. It would not be pretty in Paratext but could be useful in typesetting and other publishing processes. Could we add category information to the Paratext Style sheets?

\marker p \cat ro \TextProperties paragraph publishable nonvernacular \Italic

— Reply to this email directly, view it on GitHub https://github.com/usfm-bible/tcdocs/issues/49#issuecomment-1953045354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLMO3MCJ5RYGGVMFEKDMILYUOQJPAVCNFSM6AAAAAA6Y35UWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGA2DKMZVGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>