Open foriequal0 opened 1 year ago
I am not sure if this a bug given the context. According to the guidelines here whether or not to add spaces around quotes is disputed.
If this does get implemented. How does quoted CJK followed by or preceded by unquoted CJK get handled?
For example something like 한글"한글"
?
Disclaimer: I'm Korean, not familiar with Chinese and Japanese. So this is mainly for Korean. Each countries have different rules and preferences. And sorry for the long comment.
About spaces inside of quotes: We have the 'National Institute of Korean Language', and they publish the official Korean spelling rule. (and they even have a QnA channel)
https://kornorms.korean.go.kr/m/m_regltn.do?#a764
부록 > 8. 큰따옴표 (Appendix 8. Double quotes)
큰따옴표의 띄어쓰기: 여는 큰따옴표는 뒷말에 붙여 쓰고, 닫는 큰따옴표는 앞말에 붙여 쓴다.
부록 > 9. 작은따옴표 (Appendix 9. Single quotes)
작은따옴표의 띄어쓰기: 여는 작은따옴표는 뒷말에 붙여 쓰고, 닫는 작은따옴표는 앞말에 붙여 쓴다.
According to the published rule, the opening quote is attached to the following word, and the closing quote is attached to the preceding word.
And personally, I don't think rules/preferences are that different among CJK.
Let me clarify that I just wanted to say that spaces inside of quotes are weird regardless of the rule. Other than that such as spaces outside of quotes, and between English words or numbers and Korean is fine.
The following sections are speculative contexts around the CJK experience from a Korean viewpoint.
I am not sure if this a bug given the context. According to the guidelines here whether or not to add spaces around quotes is disputed.
I think the guideline is for practical purposes, not a definitive, authoritive rule, and mostly made for Chinese. Koreans have adopted the spacing rule long before. It also acts as a visual cue and a strain relief other than its critical role in the grammatical role. I haven't heard that Japanese and Chinese have such spacing rules, but I've also heard that Japanese use Hiragana after Kanji as a visual cue. So Chinese needed visual cues, and they utilized those boundaries since it is easier than devising new spacing rules into their writing system. I guess that's why Chinese and Japanese prefer these spacious 'full width' punctuations. Korean prefers smaller compact 'half width' punctuations nowadays because we have plenty of space already.
Also, I assume those who prefer spaces around links are thinking of them as similar to other types of quotations, or non-native-text such as English words, technical terms, or numbers.
Although it is mostly made for Chinese, it is useful for Korean, or Japanese Due to the long history of Latin alphabet centric typographical biases in technologies, Korean (and Chinese and Japanese might as well) learned our own word-spacing rule for the digital world regardless of what the authorities say. Many Latin-centric word boundary-based algorithms break when they face non-Latin characters, such as auto-link, or selection augmentation algorithms.
Auto-link features usually cannot properly recognize where the URL ends, so spaces are used as a common trick. It omits trailing Korean words in an URL: "https://google.com/search?q=한글" (GitHub handles well in this case) Sometimes it overruns after an URL: "https://github.com/platers/obsidian-linter/issues/724를 예시로" (은/는/을/를/이/가 is not spaced after a word when they are used as postposition in Korean spacing rule, but GitHub failed to recognize it.)
Spaces are used as a break marker for double click in Desktop, or magnifying selection in Android/iOS.
Strictly speaking, these practices are not following the official rule. I need to explain some related Korean spacing rules before properly answering your last question.
Korean spacing rule (띄어쓰기) is notorious for its complexity due to super context-sensitive unstable ad-hoc exceptions. (Other rules are confusing too, but they are separate stories)
The general principle is simple: https://kornorms.korean.go.kr/m/m_regltn.do?#a178
제2항 문장의 각 단어는 띄어 씀을 원칙으로 한다. Paragraph 2. Each word in a sentence is spaced by default.
But right after that, they tell us that some words can't be spaced since they don't have separability. It is explained later chapter that is entirely devoted to exceptions of spacing rules with other exceptions. https://kornorms.korean.go.kr/m/m_regltn.do?#a182
Let me highlight some exceptions related to this issue.
Lv 1. numbers, https://kornorms.korean.go.kr/m/m_regltn.do?#a263
제43항 단위를 나타내는 명사는 띄어 쓴다. Paragraph 43. The unit noun is spaced.
다만, 순서를 나타내는 경우나 숫자와 어울리어 쓰이는 경우에는 붙여 쓸 수 있다. However, you can omit the space if it represents a sequential, or be used with Arabic numbers.
They explain that you can omit it if you find it more readable when it is omitted.
Lv 10. Simple terms consist of English and Korean. https://www.korean.go.kr/front/onlineQna/onlineQnaView.do?mn_id=216&qna_seq=229790 Someone asked about some medical terms such as 'CT 촬영(CT scan)' should be spaced or not. According to the answer, 'there is no rule on mixing English words and Korean words. If we apply the spacing rule as if they were Korean words, they should be spaced. However, they were registered in the open dictionary of Korean(https://opendict.korean.go.kr/main) as a special term, the space can be omitted.
related rule: https://kornorms.korean.go.kr/m/m_regltn.do?#a199
제50항 전문 용어는 단어별로 띄어 씀을 원칙으로 하되, 붙여 쓸 수 있다.(ㄱ을 원칙으로 하고, ㄴ을 허용함.) Paragraph 50. Special terms are spaced by word by default, but you can omit spaces. (ㄱ are the principle, but ㄴ are also allowed)
Lv 100. Your question.
If this does get implemented. How does quoted CJK followed by or preceded by unquoted CJK get handled?
For example something like
한글"한글"
?
https://korean.go.kr/front/onlineQna/onlineQnaView.do?mn_id=216&qna_seq=276876
따옴표의 뒤에 쓰인 표현이 앞말에 붙여 쓰는 표현이라면 따옴표와도 붙여 쓰고, 앞말과 띄어 쓰는 표현이라면 따옴표와도 띄어 씁니다. If the expression after the quote was meant to be attached to the preceding word, you should attach it to the quote too. If not, you should add a whitespace after the quote.
https://korean.go.kr/front/onlineQna/onlineQnaView.do?mn_id=216&qna_seq=229967 This QnA has more complex examples, but they suggest the same rule: It depends on whether the following expressions can be attached when there were no quotes.
Although it is about the whitespace after the closing quote, I think the rule is applied to the space before the opening quote similarly. But I personally think we usually open a quote starting with a separable word.
Lv ???. How do I know whether they should be attached or not?
It heavily depends on the context.
'띄어쓰기' itself is one of the famous exceptions in the spacing rule. They should be spaced since they are basically '띄어'(spaced) + '쓰기'(writing), right? https://ko.dict.naver.com/#/correct/korean/info?seq=4415 No. It is a single word since it is a special term. e.g. '띄어쓰기는 어렵다' (the spacing rule is hard) However, if it is used as another form (such as verbal noun form?), it should be spaced. e.g. '바르게 띄어 쓰기는 어렵다' (spaced writing is hard to do correctly)
It even depends on the context that is not written. We translate 'my country' as '우리'(our) + '나라'(country). Should they be spaced? '우리나라' is used only if the narrator is Korean and they mean 'Korea, my country'. Otherwise, you can't. It's always grammatically incorrect for American to say '우리나라'. They only can say '우리 나라' for the US, '한국'(Korea), '대한민국'(Republic of Korea), '남한'(South Korea) for Korea.
But the national authority of Korean doesn't have a law-enforcement ability. I often choose the practicality rather than following the strict rule when I write plaintexts for digital media where complex markups are not readily available. So I think we shouldn't dare to implement a full blown Korean spacing rule engine. It'll never be perfect, and no one will want it. We need to keep it simple, and anyone who find this option doesn't work for them can disable it anytime.
But I don't have a clear, simple, fit-for-all solution for now, but at least I can try to re-frame the problem and lay random thoughts..
I skimmed the examples and I found that they can be understood as "spacing around a foreign segment in a text". https://github.com/platers/obsidian-linter/blob/c2431dcca9b94b77b227f19278b1c569ad2ea606/src/rules/space-between-chinese-japanese-or-korean-and-english-or-numbers.ts
Text segments might be categorized as one of:
Inline items are easy to recognize. Markdown styling markers, HTML tags are also easy to parse. However, remainings seem tricky.
We can't actually draw a hard line between languages, but grouping CJK text as a single cultural segment as currently implemented is not that bad in practice. (CJK shares some Hanzi/Hanja/Kanji codepoint in Unicode. Japanese mixes Kanji and Hiragana, and Koreans mixes Hanja and Korean in some cases)
How do we detect whether the text is inside of quotes, or even parenthesis or not?
English sometimes mixes apostrophe and single quote.
Sometimes these quotes or apostrophes are used as some unit markers, such as 2'1"
,
even some countries uses '
as a digit grouping marker.
Text emoji make it more complicated :'(.
Also I think we need some kind of recovery point. Otherwise, it'll wiggle the entire document.
How MS Word auto-fix apostrophe and quotes?
How do we handle punctuations around these segments?
such as What about "this"? It is subtle (at least for me).
I think they can be attached to foreign segments, and treat it like an extension of it
like SyntaxTrivia in C# Roslyn compiler (https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/Syntax/SyntaxTrivia.cs)
(SyntaxTrivia are such as leading indenting, trailing comma, leading/trailing comments, etc)
How about reduce the option to target easier things such as inline items, italic/bold, HTML tags first and handle quotes and parenthesis, language boundary later with a separate option?
Or, I can disable the option for now, and revisit later when I have a good solution.
I would go ahead and disable this rule for now. I am not sure of a good solution at this time since determining an opening and closing quote is hard when not using smart quotes.
I'm from China and I'm using Chinese Simplified. I searched for this feedback after using this plugin and experiencing the same problem. From the locale I'm in, there are no spaces inside the quotes. For example: "净资产", not " 净资产 " I tried using the plugin "remove spaces before and after characters" option, and after filling in the quotes, it didn't work.
That's why I think putting spaces inside the quotes confuses me because when a sentence has multiple quotes, it's not easy for me to read and tell which part is inside the quotes. For example: This plugin is automatically adjusted: 办理股权转让申报在网上完成 " 纳税人实名注册 "、" 被投资企业事先报告 " 和 " 扣缴/纳税申报信息填写 " The right way: 办理股权转让申报在网上完成 "纳税人实名注册"、"被投资企业事先报告" 和 "扣缴/纳税申报信息填写"
I think the second is easy to distinguish.So I'll give you feedback.
The only 2 solutions I can think of for this are as follows:
Do those 2 scenarios make sense? The reason I state them is because if I cannot assume either of the 2 scenarios is the case, I cannot properly determine if spaces are needed before or after a quote.
Describe the Bug
The rule "Space Between Chinese Japanese Or Korean And English or numbers" adds spaces in awkward position for quoted CJK.
"한글"
is changed to" 한글 "
, which is awkward, while"English"
is not changed.How to Reproduce
Steps to reproduce the behavior:
"한글"
Expected Behavior
"한글"
shouldn't be changed.Device