w3c / hlreq

Hebrew script layout requirements
Other
8 stars 9 forks source link

Hebrew Hyphen #8

Open r12a opened 7 years ago

r12a commented 7 years ago

[from @weigern] Proper Hebrew hyphen is used in compounds, constructs and other cases in Hebrew. Unlike a dash or a "minus" sign used for separation etc., a Hebrew hyphen is aligned with the top of the letters. It is represented in Unicode (https://www.fileformat.info/info/unicode/char/05be/index.htm) but because it is not easily accessible, in current Hebrew usage a dash or "minus" sign are much more prevalent

image

image

image

r12a commented 7 years ago

[from @tomerm] @Weigern very interesting :+1: In case of third example it is pretty straightforward: "Municipality of Tel--Aviv". Hebrew hyphen is used to separate two parts of city name (Tel-Aviv which means in Hebrew literally Mound-Spring). For more on the etymology of this name please see: https://en.wikipedia.org/wiki/Tel_Aviv#Etymology_and_origins

It is far less clear why it is used in first two cases. For example middle example says: [And God said, "Let there be--light," and there was--light]. I replaced Hebrew hyphen with double minus sign in English translation. The hyphen is positioned between verb and noun. Do you have any idea for what purpose ?

Additional interesting cases of Hebrew punctuation (including cantillation marks) can be found at: https://en.wikipedia.org/wiki/Hebrew_punctuation

r12a commented 7 years ago

[from @weigern] @tomerm I believe - as you mentioned at the bottom of your comment - that the in the 1st two cases the hyphen/"maqaf" is used as a cantillation mark

matial commented 7 years ago

As far as I know, Maqaf is NOT a cantillation mark.

tomerm commented 7 years ago

@matial In such case what purpose does it serve in: [And God said, "Let there be--light," and there was--light] ? Alternatively are you saying that what we see here is not Maqaf but something else ?

matial commented 7 years ago

What we see certainly is a Maqaf (BTW, the Maqaf in "Let there be--light," is not present in all editions of the Bible. Checking on a few editions I could find shows that about half of them have it and the other half don't). The purpose must be some sort of connection just like in Tel-Aviv, but I know no better.

lkemmel commented 7 years ago

I found quite an interesting article: http://www.mechon-mamre.org/i/aboutpsq.htm

Although it talks about custom punctuation conventions, I think the following can be relevant in our case (or at least worth of further study):

"כל מקף (-) מסומן, אפילו שלא במקום סמיכות כמו בעברית החדשה"

"סדר כבדות ההפסק מן הקל אל הכבד הוא: , -- ; : . והמקף מעין אנטי-פיסוק שעושה שתי מילים או יותר לתיבה אחת" [Highlights: (1) Usage of MAQAF differs from that in the modern Hebrew. (2) The strength of punctuation from weakest to strongest is: -; :
And the MAQAF is a kind of anti-punctuation that unites two or more words]

Applying this to our case, could the Maqaf be used to indicate immediateness / G-d's full control of the world (?)

lkemmel commented 7 years ago

The Academy of the Hebrew Language sorts this out http://hebrew-academy.org.il/topic/hahlatot/punctuation :

המקף הבא במקרא שימושו אחר, והוא שייך בעיקרו לטעמי המקרא: הוא מורה על צירוף שתי מילים או יותר הנקראות בטעם אחד, וצירוף זה תלוי בנסיבות מוזיקליות.

(Maqaf in the Bible has different [from the modern Hebrew] usage, and it's primarily associated with cantillation marks: it indicates combination of 2 or more words that are pronounced in one breath...)

So yes - Maqaf in the Bible primarily indicates "anti-punctuation".

lkemmel commented 7 years ago

... and, of course, the Academy provides comprehensive information on using the Maqaf in the modern Hebrew.

amire80 commented 6 years ago

I added something about this gap analysis. I'll check for more info.

tomerm commented 6 years ago

Hi @amire80 ! Where is Hebrew gap analysis is published ? The link to such analysis mentioned in https://w3c.github.io/hlreq/ is broken (https://w3c.github.io/gap-analysis/).

r12a commented 6 years ago

@tomerm https://w3c.github.io/hlreq/gap-analysis/ I'll take the action to fix that link in hlreq.

tomerm commented 6 years ago

Thanks a lot @r12a @amire80 I see just the following empty section under 3.2 dedicate to hyphenation. image I just want to make sure I look at the right place and correctly interpret your comment above.

amire80 commented 6 years ago

No, it was not about hyphenation. If I understand correctly, hyphenation is about breaking long words in the end of the line. This is occasionally practiced in Hebrew newspapers, but I couldn't find any clear rules about this. I'd love to find something like this, because it would be perfect for this document, but I'd rather not do it without any kind of written reference.

What I refer to in my comment above is the section about word boundaries.

tomerm commented 6 years ago

Hyphen (מקף) used for hyphenation (מיקוף) of words is mentioned here:

Regarding rules for hyphenation I have found only customization option provided by Adobe: image

amire80 commented 6 years ago

Thanks, but unfortunately none of these talk about actual rules of Hebrew hyphenation. In other languages, such as Russian, there are rules that discuss whether it's allowed to break the word in the middle of the syllable or not, etc. I cannot find anything like that for Hebrew. I don't think that taking only word length into account is enough; it's better not define a rule at all.

My only hope is to find something like a style guide of some newspaper.

doron1 commented 6 years ago

@amire80 FWIW, I have tried to research this a few months back. Not finding any noteworthy documentation to any hyphenation standard for Hebrew, I discussed it with a person from the Academy of the Hebrew Language. She confirmed that (to the best of her knowledge) there is no authoritative ruleset for hyphenation in Hebrew.

tomerm commented 6 years ago

@doron1 @amire80 I just realized that this work item is dedicated to hyphen (not hyphenation). Hyphen is used in numerous contexts (hyphenation is just one of those). For example: image

r12a commented 6 years ago

Perhaps we should start a new issue related to hyphenation. I'll put my comment here for now, and we can move it if desired.

Wrt Hebrew hyphenation, and maqaf:

In https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedLineBreak.txt we see that

  1. Characters of class HY include just - (hyphen-minus).

  2. Characters of class BA (break after) can be seen at https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedLineBreak.txt (search for '; BA'), and include

    05BE          ; BA # Pd       HEBREW PUNCTUATION MAQAF

    So, in theory, Unicode says that you can break a line after maqaf. HOWEVER, https://www.unicode.org/reports/tr14/tr14-39.html#HL gives the following (my emphasis):

HL: Hebrew Letter (XB)

This class includes all Hebrew letters.

When a Hebrew letter is followed by a hyphen, there is no break on either side of the hyphen. In this context a hyphen is any character of class HY or class BA. In other respects, Hebrew letters behave the same as characters of class AL.

Included in this class are all characters of General Category Letter that have Script=Hebrew.

This means that Unicode expects Hebrew to not break a line between the two things on either side of a maqaf or hyphen-minus.

You can check this out using the tool at https://unicode.org/cldr/utility/breaks.jsp. Add some Hebrew text with a maqaf in the middle, and select 'Line'. You'll see no line break divider next to maqaf.

hth

matial commented 6 years ago

In Latin languages, hyphenation must not occur between a consonant and a following vowel. In Hebrew without vowels ("Hebrew Points" in Unicode terminology) which is the most common case, the letters Alef, He, Vav, Yod can be used both as consonants and as vowels. Analyzing which is which is a complex process which may require identifying part of speech or even understanding the meaning of the word. This probably goes way beyond what we can require, and even what can be explained succinctly. A possible solution is to always forbid breaking a word before Alef/He/Vav/Yod. As has been said before, it would be nice to see some written guidelines used by people in the field.

bdenckla commented 1 year ago

The Academy of the Hebrew Language sorts this out http://hebrew-academy.org.il/topic/hahlatot/punctuation :

המקף הבא במקרא שימושו אחר, והוא שייך בעיקרו לטעמי המקרא: הוא מורה על צירוף שתי מילים או יותר הנקראות בטעם אחד, וצירוף זה תלוי בנסיבות מוזיקליות.

(Maqaf in the Bible has different [from the modern Hebrew] usage, and it's primarily associated with cantillation marks: it indicates combination of 2 or more words that are pronounced in one breath...)

Respectfully, this notion of "one breath" in relation to a maqaf compound is, as far as I know, unprecedented. I think it stems from a mistranslation of the sentence cited from The Academy.

Unfortunately, this notion/claim has been elevated from a comment buried here in this issue to "first class" text at https://r12a.github.io/scripts/hebr/he#word.

While it is true that it would be uncommon, in performance, to take a breath anywhere in a maqaf compound:

The purpose of maqaf is to create a compound word, where, by "compound word" we mean a word that, for purposes of cantillation, functions just as a non-compound word would function. (We might also call a non-compound word a "simple" or "atomic" word).

In particular, a maqaf compound needs no more than one accent. Like a long simple word, a maqaf compound can have more than one accent if its accents are used in one of several well-known patterns for multiple accents on a single word. But the point is that it doesn't NEED more than one accent, it merely may have more than one.

In contrast, if the maqaf marks were removed from the atoms of a maqaf compound, each resulting simple word would require at least one accent.

This, I believe, is the source of the error above: the phrase translated as "in one breath" would have been better translated as "with one accent".

@skadish1 please chime in if needed.

skadish1 commented 1 year ago

The Hebrew quotation means that maqaph indicates "a combination of two or more words that are read [together] with a single accent; this combination is a function of the musical context"

On Thu, Apr 13, 2023 at 4:17 PM Ben Denckla @.***> wrote:

The Academy of the Hebrew Language sorts this out http://hebrew-academy.org.il/topic/hahlatot/punctuation :

המקף הבא במקרא שימושו אחר, והוא שייך בעיקרו לטעמי המקרא: הוא מורה על צירוף שתי מילים או יותר הנקראות בטעם אחד, וצירוף זה תלוי בנסיבות מוזיקליות.

(Maqaf in the Bible has different [from the modern Hebrew] usage, and it's primarily associated with cantillation marks: it indicates combination of 2 or more words that are pronounced in one breath...)

Respectfully, this notion of "one breath" in relation to a maqaf compound is, as far as I know, unprecedented. I think it stems from a mistranslation of the sentence cited from The Academy.

Unfortunately, this notion/claim has been elevated from a comment "buried" here in this issue to "first class" text at https://r12a.github.io/scripts/hebr/he#word.

While it is true that it would be uncommon, in performance, to take a breath anywhere in a maqaf compound:

  • Phrasing indicated by cantillation is usually framed in terms of pauses, not breaths.
  • Even if we equate "breath" with "pause-free phrase," such phrases would typically be delimited by the conjunctive/disjunctive accent relationships within a verse, and the hierarchy of relationships between disjunctive accents in the verse.
  • While there will be no such delimiter within a maqaf compound, i.e., while a maqaf compound will never contain a pause and resumption within it, pause-free phrases are typically larger than a maqaf compound.
  • Thus, while it is not wrong to say that maqaf compounds are pause-free, it is not the purpose of maqaf to mark a pause-free phrase.

The purpose of maqaf is to create a compound word, where, by "compound word" we mean a word that, for purposes of cantillation, functions just as a non-compound word would function. (We might also call a non-compound word a "simple" or "atomic" word).

In particular, a maqaf compound needs no more than one accent. Like a long simple word, a maqaf compound can have more than one accent if its accents are used in one of several well-known patterns for multiple accents on a single word. But the point is that it doesn't NEED more than one accent, it merely may have more than one.

In contrast, if the maqaf marks were removed from the atoms of a maqaf compound, each resulting simple word would require at least one accent.

This, I believe, is the source of the error above: the phrase translated as "in one breath" would have been better translated as "with one accent".

@skadish1 https://github.com/skadish1 please chime in if needed.

— Reply to this email directly, view it on GitHub https://github.com/w3c/hlreq/issues/8#issuecomment-1506950185, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPT7T7SGPN7RFERDKW3GLXA74HPANCNFSM4DD62EPQ . You are receiving this because you were mentioned.Message ID: @.***>