remarkjs / remark

markdown processor powered by plugins part of the @unifiedjs collective
https://remark.js.org
MIT License
7.66k stars 358 forks source link

In Japanese, Emphasis and Strong have a bug #1107

Closed Neos21 closed 1 year ago

Neos21 commented 1 year ago

Initial checklist

Affected packages and versions

remark-parse@7.0.1, remark-rehype@8.0.0

Link to runnable example

No response

Steps to reproduce

I use Node.js v18.12.1 and npm v8.19.2. I wrote vanilla Node.js script.

{
  "rehype-stringify": "9.0.3",
  "remark-parse": "10.0.1",
  "remark-rehype": "10.1.0",
  "unified": "10.1.2"
}
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkRehype from 'remark-rehype';
import rehypeStringify from 'rehype-stringify';

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(rehypeStringify);
const result = processor.processSync(`
1. 日本語*強調*、日本語
2. 日本語*強調*。日本語
3. 日本語*強調、*日本語
4. 日本語*強調。*日本語
5. 日本語**強調**、日本語
6. 日本語**強調**。日本語
7. 日本語**強調、**日本語
8. 日本語**強調。**日本語
`);
console.log(result.value);

Expected behavior

<ol>
<li>日本語<em>強調</em>、日本語</li>
<li>日本語<em>強調</em>。日本語</li>
<li>日本語<em>強調、</em>日本語</li>
<li>日本語<em>強調。</em>日本語</li>
<li>日本語<strong>強調</strong>、日本語</li>
<li>日本語<strong>強調</strong>。日本語</li>
<li>日本語<strong>強調、</strong>日本語</li>
<li>日本語<strong>強調。</strong>日本語</li>
</ol>

Actual behavior

<ol>
<li>日本語<em>強調</em>、日本語</li>
<li>日本語<em>強調</em>。日本語</li>
<li>日本語*強調、*日本語</li>
<li>日本語*強調。*日本語</li>
<li>日本語<strong>強調</strong>、日本語</li>
<li>日本語<strong>強調</strong>。日本語</li>
<li>日本語**強調、**日本語</li>
<li>日本語**強調。**日本語</li>
</ol>

Runtime

Node v17, Other (please specify in steps to reproduce)

Package manager

npm 8

OS

Windows

Build and bundle tools

Other (please specify in steps to reproduce)

Neos21 commented 1 year ago

I found other bug with the character and . Maybe more characters have same bug.

ChristianMurphy commented 1 year ago

Welcome @Neos21! 👋 Sorry you ran into a spot of trouble.

Some background, remark implements commonmark (https://commonmark.org/) or with remark-gfm implements GFM (https://github.github.com/gfm/). The output you are currently seeing from remark is expected, it is how emphasis works in commonmark and GFM.

example of rendering in the commonmark reference implementation: https://spec.commonmark.org/dingus/?text=1.%20%E6%97%A5%E6%9C%AC%E8%AA%9E*%E5%BC%B7%E8%AA%BF*%E3%80%81%E6%97%A5%E6%9C%AC%E8%AA%9E%0A2.%20%E6%97%A5%E6%9C%AC%E8%AA%9E*%E5%BC%B7%E8%AA%BF*%E3%80%82%E6%97%A5%E6%9C%AC%E8%AA%9E%0A3.%20%E6%97%A5%E6%9C%AC%E8%AA%9E*%E5%BC%B7%E8%AA%BF%E3%80%81*%E6%97%A5%E6%9C%AC%E8%AA%9E%0A4.%20%E6%97%A5%E6%9C%AC%E8%AA%9E*%E5%BC%B7%E8%AA%BF%E3%80%82*%E6%97%A5%E6%9C%AC%E8%AA%9E%0A5.%20%E6%97%A5%E6%9C%AC%E8%AA%9E**%E5%BC%B7%E8%AA%BF**%E3%80%81%E6%97%A5%E6%9C%AC%E8%AA%9E%0A6.%20%E6%97%A5%E6%9C%AC%E8%AA%9E**%E5%BC%B7%E8%AA%BF**%E3%80%82%E6%97%A5%E6%9C%AC%E8%AA%9E%0A7.%20%E6%97%A5%E6%9C%AC%E8%AA%9E**%E5%BC%B7%E8%AA%BF%E3%80%81**%E6%97%A5%E6%9C%AC%E8%AA%9E%0A8.%20%E6%97%A5%E6%9C%AC%E8%AA%9E**%E5%BC%B7%E8%AA%BF%E3%80%82**%E6%97%A5%E6%9C%AC%E8%AA%9E

and example gfm rendering the content here on GitHub itself:


  1. 日本語強調、日本語
  2. 日本語強調。日本語
  3. 日本語強調、日本語
  4. 日本語強調。日本語
  5. 日本語強調、日本語
  6. 日本語強調。日本語
  7. 日本語強調、日本語
  8. 日本語強調。日本語

remark implements commonmark to spec, the implementation as is matches the behavior, differing would be a bug.

The way to make a change would be to have the commonmark standard change. If you'd like to nudge commonmark towards better supporting more languages, feel free to reach out in the specification discussion forum (https://talk.commonmark.org/) In particular this thread https://talk.commonmark.org/t/emphasis-and-east-asian-text/2491

github-actions[bot] commented 1 year ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.