Missing Old Hungarian diacritics

dscorbett commented 5 years ago

Font

NotoSansOldHungarian-Regular.ttf

Where the font came from, and when

Site: https://github.com/googlei18n/noto-fonts/blob/d7af81e614086435102cca95961b141b3530a027/hinted/NotoSansOldHungarian-Regular.ttf Date: 2018-10-31

Font version

Version 2.000;GOOG;noto-source:20181019:f8f3770

Issue

Noto Sans Old Hungarian is missing some diacritics. Modern Old Hungarian is in flux and there have been multiple competing proposals, some of which extend the character set with diacritics.

According to http://nyelvmuveles.hu/osi-magyar-iras-rovas/27, U+0304 COMBINING MACRON was proposed in 1903 to mark long vowels, and U+0307 COMBINING DOT ABOVE was recently proposed to distinguish /e/ from /eː/.

According to https://web.archive.org/web/20190505131414/http://nyelvmuveles.hu/elveink/tanuljunk-konnyen-rovasirni-%E2%80%93-koszonet-raduly-janosnak-a-bolcs-megoldasert, U+0301 COMBINING ACUTE ACCENT was also proposed to mark long vowels.

According to L2/11-242R, U+1DC4 COMBINING MACRON-ACUTE is a duplicating mark.

ghost commented 5 years ago

Please remove non-UNICODE characters. Most of the characters (like with .ltr subtag) won't appear. See the https://unicode.org/charts/PDF/U10C80.pdf document! Don't use ad-hoc "standards", please.

ghost commented 5 years ago

No Tofu doesn't mean "stuff with fonts". It means "Won't Tofus , where are standardised characters defined in UNICODE."

ghost commented 5 years ago

There is a must to keep opened this Issue?

ghost commented 5 years ago

@dscorbett There is a must to be opened this Issue?

ghost commented 5 years ago

@dscorbett Using acute accent in Old Hungarian script is not scientific. Acute accent is using in latin based Hungarian script. Latin based modern Hungarian script use the acute accent for long vovels. Old Hungarian script use differrent letters for long and short vovels. Please close this Issue

ghost commented 5 years ago

Sorry, I got a spelling problem. I wrote vovels instesd of vowels.

ghost commented 5 years ago

@dscorbett It might be close!

dscorbett commented 4 years ago

@marekjez86, why was this closed?

marekjez86 commented 4 years ago

@dscorbett : Noto fonts try to include scripts and features as defined by Unicode (many less :-)). It's not clear to me that any character outside of "U+1DC4 COMBINING MACRON-ACUTE duplicating mark" is part of the standard.

This means that semantics of U+0301 COMBINING ACUTE ACCENT, U+0304 COMBINING MACRON , U+0307 COMBINING DOT ABOVE in Old Hungarian will not be implemented in Noto until Unicode addresses these.

However, because we need to look at U+1DC4 COMBINING MACRON-ACUTE duplicating mark I'll re-open the issue. Thank you for pointing this up.

ghost commented 4 years ago

Why need you U+1DC4 COMBINING MACRON-ACUTE mark?

ghost commented 4 years ago

@dscorbett Why do you need U+1DC4 COMBINING MACRON-ACUTE, and the others?

ghost commented 4 years ago

@dscorbett Why do you need U+1DC4 COMBINING MACRON-ACUTE, and the others?

dscorbett commented 4 years ago

Because they are attested.

ghost commented 4 years ago

@dscorbett I don't understand, why is not enough implementing UNICODE range U+10c80-U+10cff. It has more than required letters for writing Hungarian texts. If you strongly want these additional chars, implement them! I would like, if the NoTo Old Hungarian font would be made, and published. When I asked to implement reversed question mark, the answer was, that them implemented in base NoTo Sans font. If yours one once are implemented, why is needed implement in this font? Do you know, UNICODE Old Hungarian standard has letters Old Hungarian "a" and "á" , not required adding acute chars. I don't know, why was needed implementing ligatures without accessing UNICODE entry points, simple linking to names is not enough. I am Old Hungarian script fan and programmer, too. For the operating systems these letters are invisible!

ghost commented 4 years ago

@dscorbett @marekjez86 Old_HUn_Vowels.pdf The document above consists all of vowels : short and long, too. You can see, that long vowels do not require acute!

ghost commented 4 years ago

@dscorbett @marekjez86 I wouldn't like that you work unnecessary!

ghost commented 4 years ago

@dscorbett @marekjez86 I just read http://nyelvmuveles.hu This homepage is one of the dosen "scientific reformer" page of Old Hungarian. The ad-hoc standard, you linked, is not accepted by UNICODE and is not accepted by the other Old Hungarian fan communities. Additionally acutes in Old Hungarian scripts never was used during in the history. Acutes used only in latin based Hungarian script because of that, base latin letters is not enough for writing readable Hungarian texts. Please comment it and close this issue. There is, for example "scientific reformer" page is http://rovas.info/

dscorbett commented 4 years ago

Unicode encodes characters; it doesn’t “accept” specific orthographies, so your latest comment is misleading. From your comments on this and other reports, it is clear that you do not understand Unicode. Your preferred style of Old Hungarian is not the only one worthy of font support. I am not going to close the Old Hungarian issues I’ve opened.

ghost commented 4 years ago

@dscorbett

ghost commented 4 years ago

@dscorbett You do not understand me! The Old Hungarian script is historial script. Never used it with acute during the history. The UNICODE standard 8.0 defines Old Hungarian glyphs and its codepoints. If you think, that I do not understand UNICODE philosophy, let me these "missing" characters to add to Old Hungarian font. May I ask You to tell me, where could I find the original noto fonts sources, and tell me please working with fontforge is ready for development, or I must be choose another font editor. I do it in the next week.

ghost commented 4 years ago

@marekjez86 You can help me to give me informations I asked previous comment?

ghost commented 4 years ago

Unicode encodes characters; it doesn’t “accept” specific orthographies, so your latest comment is misleading. From your comments on this and other reports, it is clear that you do not understand Unicode. Your preferred style of Old Hungarian is not the only one worthy of font support. I am not going to close the Old Hungarian issues I’ve opened.

I am so sorry, that I don't speak English fluently. I think, you don't speak/read Hungarian language at all.

ghost commented 4 years ago

@dscorbett I don't prefer a style. I prefer the standard. And the UNICODE standard doesn't encode Old Hungarian script's long vowels with acute, it encode with different form. And You first see the base NotoSans font. The characters, you need, implemented once in that main font.

ghost commented 4 years ago

@marekjez86 You could check me back?

ghost commented 4 years ago

I'm sorry, I've got problems with CLA.

ghost commented 4 years ago

@dscorbett There are pull request #1650

ghost commented 4 years ago

@dscorbett Its good for You?

dscorbett commented 4 years ago

The diacritics in your PR need anchor points to be placed correctly.

And the UNICODE standard doesn't encode Old Hungarian script's long vowels with acute, it encode with different form.

Yes, Unicode does encode characters like U+10C81 OLD HUNGARIAN CAPITAL LETTER AA. That code point is specifically for the glyph that looks like a modified U+10C80. Unicode doesn’t say that that is the only way to represent á; it’s just the only one with a single code point. If someone wants to use a diacritic instead, it’s not against the standard.

A good question is whether Unicode intends for inherited-script diacritics like U+0301 to be used with Old Hungarian. The standard doesn’t say either way, but because of the proposal recommending U+1DC4 I think inherited-script diacritics are intended.

Another good question is whether any particular orthography (for example, the one that uses acute accents on Old Hungarian letters) should be supported in this font. That’s a question of project scope for Google to decide. I recommend erring on the side of inclusivity, since it is so easy to support them.

ghost commented 4 years ago

@dscorbett Listen, there was a congress in Hungary about the standardization of Old Hungarian script. They decided the forms of the Old Hungarian symbols too. The homepage, you linked in, is a poststandardizator page, stuffed by "facts", what you believed. By the other word, would you implement French letter ë with with double accute?

ghost commented 4 years ago

@dscorbett I will resolve the anchors problem.

ghost commented 4 years ago

"Yes, Unicode does encode characters like U+10C81 OLD HUNGARIAN CAPITAL LETTER AA. That code point is specifically for the glyph that looks like a modified U+10C80. Unicode doesn’t say that that is the only way to represent á; it’s just the only one with a single code point. If someone wants to use a diacritic instead, it’s not against the standard." @dscorbett I wrote before, you don't speak/read Hungarian language at all. The latin based Hungarian aa letter's form is á. The Old Hungarian letter AA is the same thing as latin based Hungarian letter Á. The latin based Hungarian texts are written as phonetics as it possible! The Old Hungarian script is fully phonetic script. I working on a spellchecker for Old Hungarian texts. Foreign words, like a city name Łódż, is impossible to translate to Old Hungarian script. We must keep these words in original form, as we do within latin based Hungarian texts. Most of Hungarian people don't know, how must to tell city name Łódż, because it written as Polish do. Polish letter ó sounds like long u, SOUND OF letter Ł and ż unknown for Hungarians. Most of native Slavics can not tell words correctly with German or Hungarian SOUND Ö, they replace sound Slavic (and Hungarian) SOUND E. English and French spelling is too old for the phonetic writing. Latin based Hungarian letters was DEVELOPED in the 18th century. The Old Hungarian scripts during the history was used by Seclers, whoose native language is Hungarian. Seclers did not need write foreign words. They is not only an etnic group, they was solders, too.
Do you understand, what I would like to explain?

ghost commented 4 years ago

@dscorbett Old Hungarian section symbols' meanings written in "Morse telegraphy" like description, as coded "Morse" Hungarian text. For example "oe" means letter "ö". Unfortunately this explain understand clearly only Hungarian people. For exampe: Oee-r-ue-l-t n-oe-k k-e-t-r-e-c-e "Morse code" means "Őrült nők ketrece" That's already clear?

ghost commented 4 years ago

Sorry, I got a mistake: Telegraphy form: "Oee-r-ue-l-t n-oee-k k-e-t-r-e-c-e" means "Őrült nők ketrece"

ghost commented 3 years ago

@dscorbett David, could you explain, how does it work in your opinion?

dscorbett commented 3 years ago

How does what work?

ghost commented 3 years ago

How does what work?

Combining acute accent and the others with Old Hungarian script. You opened this issue, aren't you?

dscorbett commented 3 years ago

The acute accent and other combining marks go on top of letters.

ghost commented 3 years ago

@dscorbett Listen, David, I read articles from the http://nyelvmuveles.hu You wrote this page as reference. I read their articles according to latin based spellings, and I was surprised, what they wrote. On their opinion in much more cases must to be used close e ( ë) in modern latin based Hungarian spelling. For example word they wrote "rëformáció" instead of "reformáció". I live in a city near Budapest, there is never used this vovel. I lived in "Tolna megye" ( Tolna country) in Paks , I learned in the college of Nuclear Power Station at Paks. This college was a hard and famous college in Hungary. I met students from the different places of Hungary. They never used vovel ë. My father's coleauges came from "Zemplén", near Tokaj (yes, It is the City which famous from its wein), they never used vovel ë. There are only known people "Palóc"s people, - who are living in "Nógrád megye" mixed with Slovaks - use close e (ë). I think, it's not a good reference at all.

ghost commented 3 years ago

The acute accent and other combining marks go on top of letters.

Why? There are Old Hungarian letters, which are used paired to latin based letters with, for example above acute. Hungarian letter "á" is same as Old Hungarian letter aa. I wrote before, that te examinations of letters written in telegraphy form. Oee is ő, aa is á, etc.

ghost commented 3 years ago

@dscorbett David, do you understand, why I want to be closed this issue? I met a young person, who use above acute with Old Hungarian u in his own diary. I asked him, why use this form. He answered, he knew, what the real form of Old Hungarian ú, but he imagined he will use this form. He was addicted to drugs. Several years after I met him again. He already use correct Unicode forms of letters. He was clearer now. But still believe other things from the net, which aren't exactly true.

twardoch commented 3 years ago

Fonts are tools. They are not just tools to write in a way that is considered normative. Whenever possible, the default rendering of text set in a font should produce the normative text. But orthographies change, and also there are philologists, linguists, minority users who may want to specifically write texts that are spelled in a non-normative way (even just to illustrate spelling mistakes).

Fonts should allow this. Combining diacritical marks were added to Unicode so that they can be placed over various letters. So if a user (for whatever reason) enters a letter followed by a combining mark, the font should place the mark reasonably — just like a scribe would do if they needed to put some mark over some letter.

If a font contains combining marks and letters, then those marks should get visually pleasing positioning over the letters, via mark attachment.

twardoch commented 3 years ago

If there are competing groups who are trying to reform Old Hungarian, there is scientific debate on the topic. Generally, fonts should allow such scientific debate. If texts are made by some community that use some combinations, then this is valid use. Whether some other community disagrees with the particular spelling — that should not preclude the fonts from displaying those combinations.

ghost commented 3 years ago

If there are competing groups who are trying to reform Old Hungarian, there is scientific debate on the topic. Generally, fonts should allow such scientific debate. If texts are made by some community that use some combinations, then this is valid use. Whether some other community disagrees with the particular spelling — that should not preclude the fonts from displaying those combinations.

The document linked by David is one of the documents, which are never was leading in by UNICODE, one of the ad hoc submissions. UNICODE archives all documents, that was sending for the organisation. This submission isn't consistens whith homepage of the people, who submitted. They wrote about 39 letters, but in the document appear that kind of letters, which are borrowed from the original standard. It isn't clear, what they wanted, which letters need above acute, which letters need above dots.

ghost commented 3 years ago

@dscorbett @google-admin This issue must be closed, because of that it built in a submission, which never was leading in. The submission has not only request for use above acute and the others. The request has a descripts, how need, on their opinion, reencode the Unicode Old Hunģarian codepage. The request wants remove for example letter ech and replace with an another letter. It has conflict with the existed Old Hungarian codepage, which was leading in UNICODE standard 8.0

ghost commented 3 years ago

@google-admin ! David Corbett (@dscorbett) refers to the document 11242r-n4110r-oldhungarian-adhoc.pdf which have codepoint conflicts with the existing Old Hungarian script' s codepoints which already leaded in by UNICODE standard 8.0 U10C80.pdf

ghost commented 3 years ago

If there are competing groups who are trying to reform Old Hungarian, there is scientific debate on the topic. Generally, fonts should allow such scientific debate. If texts are made by some community that use some combinations, then this is valid use. Whether some other community disagrees with the particular spelling — that should not preclude the fonts from displaying those combinations.

The debate already was closed, when Old Hungarian glyphs was leading in by UNICODE standard 8.0. Owners of page http://nyelvmuveles.hu try reform everything, thats they imagined themselves. Letter ë is not a part of latin based Hungarian spelling system. The document, was refered by David, 11242r-n4110r-oldhungarian-adhoc.pdf is submitted, but never was leading in, it has codepoint conflicts with the UNICODE standatd Old Hungarian script. See document U10C80.pdf

dscorbett commented 3 years ago

People have used diacritics with Old Hungarian. I have no opinion on whether this is a good idea; I seek only to document what is attested. It is immaterial what website the evidence is hosted on or whether the other sections of a certain proposal document contradict the established Unicode encoding. The fact that Old Hungarian was encoded in Unicode 8.0 is irrelevant: that does not preclude the use of diacritics.

Figure 2-12 of L2/11-087 shows a diacritic that L2/11-242R identifies as U+1DC4 COMBINING MACRON-ACUTE.

Section 11 of L2/09-059R claims that Petrovay János used a horizontal overline to denote long vowels, which figure 3 of “Egy személyes történet a nemzedékeken át megőrzött ősi örökségről” corroborates. Scripts in Unicode nearly always use either their own script-specific diacritics or the common-script diacritics, not a mix of both. Therefore, if the duplicating mark is the common-script U+1DC4, this diacritic is the common-script U+0304 COMBINING MACRON.

Section 13 of L2/09-059R proposes a dot above U+10C8B OLD HUNGARIAN CAPITAL LETTER EE to change it from /eː/ to /e/. This would be U+0307 COMBINING DOT ABOVE. This is a new invention by Bakonyi Gábor; I don’t know if it has been used outside that proposal.

Page 36 of Az egységes rovás includes a scan of a page of Ráduly János’s Tanuljunk könnyen rovásírni which uses U+0301 COMBINING ACUTE ACCENT to mark long vowels.

ghost commented 3 years ago

People have used diacritics with Old Hungarian. I have no opinion on whether this is a good idea; I seek only to document what is attested. It is immaterial what website the evidence is hosted on or whether the other sections of a certain proposal document contradict the established Unicode encoding. The fact that Old Hungarian was encoded in Unicode 8.0 is irrelevant: that does not preclude the use of diacritics.

Figure 2-12 of L2/11-087 shows a diacritic that L2/11-242R identifies as U+1DC4 COMBINING MACRON-ACUTE.

Section 11 of L2/09-059R claims that Petrovay János used a horizontal overline to denote long vowels, which figure 3 of “Egy személyes történet a nemzedékeken át megőrzött ősi örökségről” corroborates. Scripts in Unicode nearly always use either their own script-specific diacritics or the common-script diacritics, not a mix of both. Therefore, if the duplicating mark is the common-script U+1DC4, this diacritic is the common-script U+0304 COMBINING MACRON.

Section 13 of L2/09-059R proposes a dot above U+10C8B OLD HUNGARIAN CAPITAL LETTER EE to change it from /eː/ to /e/. This would be U+0307 COMBINING DOT ABOVE. This is a new invention by Bakonyi Gábor; I don’t know if it has been used outside that proposal.

Page 36 of Az egységes rovás includes a scan of a page of Ráduly János’s Tanuljunk könnyen rovásírni which uses U+0301 COMBINING ACUTE ACCENT to mark long vowels.

The documents, that you refer, weren't leading in as standard and overrides the existing Old Hungarian UNICODE standard codepoints. The adopted charset and its codepoints downloadable from http://unicode.org/charts homepage. Because your refered documents overrides the codepoints, impossible to use within the existing NotoSansOldHungarian font. Please follow the standard codepoints from document downloadable from http://unicode.org/charts page.

ghost commented 3 years ago

@dscorbett There are a document, you refer, that describes the Old Hungarian "a" letter's codepoint into u0860 codepoint. UNICODE point u0860 is the one of the syriac codepoint.

ghost commented 3 years ago

People have used diacritics with Old Hungarian. I have no opinion on whether this is a good idea; I seek only to document what is attested. It is immaterial what website the evidence is hosted on or whether the other sections of a certain proposal document contradict the established Unicode encoding. The fact that Old Hungarian was encoded in Unicode 8.0 is irrelevant: that does not preclude the use of diacritics.

Figure 2-12 of L2/11-087 shows a diacritic that L2/11-242R identifies as U+1DC4 COMBINING MACRON-ACUTE.

Section 11 of L2/09-059R claims that Petrovay János used a horizontal overline to denote long vowels, which figure 3 of “Egy személyes történet a nemzedékeken át megőrzött ősi örökségről” corroborates. Scripts in Unicode nearly always use either their own script-specific diacritics or the common-script diacritics, not a mix of both. Therefore, if the duplicating mark is the common-script U+1DC4, this diacritic is the common-script U+0304 COMBINING MACRON.

Section 13 of L2/09-059R proposes a dot above U+10C8B OLD HUNGARIAN CAPITAL LETTER EE to change it from /eː/ to /e/. This would be U+0307 COMBINING DOT ABOVE. This is a new invention by Bakonyi Gábor; I don’t know if it has been used outside that proposal.

Page 36 of Az egységes rovás includes a scan of a page of Ráduly János’s Tanuljunk könnyen rovásírni which uses U+0301 COMBINING ACUTE ACCENT to mark long vowels.

Yes, the Page 36 of Az egységes rovás really have a copy of Ráduly János's (his name written in Hungarian names' order, international form is János Ráduly) Old Hungarian script plan, but as you can read, de document dated in 2012, and the end, as you can read, (but it seems you don't speak Hungarian) it was decided it will be not the base of UNICODE standard Old Hungarian script. The adopted Old Hungarian letters by UNICODE downloadable from page http://unicode.org/charts ! I think, Noto's goal to make and publish all characters of UNICODE standards, not make all variants of letters' form and use its codepoints, - (there are codepoints, which override the UNICODE standard, not only the Old Hungarian codepoints, in some of these linked to syriac letters ) - which are published different documents, related to the UNICODE standardisation process and debate. The debate already closed by point of view of UNICODE. If you still do not understand, why must to this Issue closed, it's not my mistake.

notofonts / noto-fonts