remarkjs / remark-lint

plugins to check (lint) markdown code style
MIT License
930 stars 130 forks source link

Many packages does not distinguish halfwidth and fullwidth characters #310

Open simnalamburt opened 4 months ago

simnalamburt commented 4 months ago

Initial checklist

Affected packages and versions


Link to runnable example

No response

Steps to reproduce

Open below with browser:

<!doctype html>
<script type="module">
import {reporter} from ''
import {remark} from ''
import remarkGfm from ''
import remarkLint from ''
import remarkLintTablePipeAlignment from ''

const file = await remark()
False positive
Below should be OK but linter will report it as error.

| Alpha | 한글 |
| a     | b    |

False negative
Below should be error but linter will report it as OK.

| Alpha | 한글 |
| a     | b  |

document.body.innerHTML = `<pre>${reporter(file)}</pre>`

Expected behavior

The first one should be fine and the second one should be an error.

16:16-16:17 warning Misaligned table fence table-pipe-alignment remark-lint

⚠ 1 warning

Actual behavior

However, the first one gives an error and the second one does not.

8:16-8:17 warning Misaligned table fence table-pipe-alignment remark-lint

⚠ 1 warning


Node v17, Node v16, Node v14, Node v12, Deno, Electron, Other (please specify in steps to reproduce)

Package manager

npm 8, npm 7, npm 6, yarn 2, yarn 1, pnpm, Other (please specify in steps to reproduce)


Windows, Linux, macOS, Other (please specify in steps to reproduce)

Build and bundle tools

Webpack, Rollup, esbuild, Parcel, Create React App, Gatsby, Next.js, Remix, Docusaurus, Snowpack, Vite, Other (please specify in steps to reproduce)

wooorm commented 4 months ago


This rule measures characters. The size of displayed characters is different in different places. Characters will be different in a terminal, different here on GitHub, different in your editor, different in my editor.

There isn’t a good solution. I recommend turning this rule off instead. See for more info

simnalamburt commented 4 months ago

Majority of programmers codes with fixed sized font and width of CJK characters in such situation are well defined and documented in here:

We at least need to create a new option to respect the size of full width character. Something like "fixed width font mode"

wooorm commented 4 months ago

Have you tested your ideas in different code editors?

simnalamburt commented 4 months ago

I'm Korean. Yes I have been used CJK letters more than 20 years in many editors including vi, nvim, Vim, nano, micro, ed, terminal, emacs, VSCode, atom, zed, notepad.exe, notepad++, Visual Studio, sublime. In these editors, CJK letters will be displayed as fullwidth character (which have double width of halfwidth character) unless you explicitly change your font into something other than fixed width font.

simnalamburt commented 4 months ago

Just in case you might not familiar with CJK:


wooorm commented 4 months ago

I’m looking at your code example here on GH and they’re both displayed “misaligned”

simnalamburt commented 4 months ago

I don't think it's a good idea to ignore this issue to satisfy GitHub's misrepresentation of CJK characters.

simnalamburt commented 4 months ago

I at least want to solve this problem by creating a linter option that respects the size of full-width characters. Since it's an option, it shouldn't affect existing behavior. I strongly believe that this should be the default behavior, not an option to opt-in, and this is what any CJK developer would have done. But if you disagree, I'd be happy to see it fixed as an option. What do you think?

wooorm commented 4 months ago

GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.

My point is that all tools are going to display it differently. What is the point in having an option to this rule which will look fine for some users and broken for other users.

That's what the rule currently does: broken for some, good for some. So I suggest not using this rule.

You say that all tools behave one way. I show GH doing it differently. I know editors display emoji differently. Can you provide screenshots for different tools displaying your examples correctly?

simnalamburt commented 4 months ago

This it it


Emoji containing version if you need:

simnalamburt commented 4 months ago

"remark-lint-table-pipe-alignment" have been existed for a long time because it is useful to some users even if it's fine for some non-CJK users and broken for CJK users.

And so the new fullwidth-respecting option will be the same. It will be pretty much useful for CJK users.

wooorm commented 4 months ago

Everyone uses emoji. Not just Koreans.

different tools

Can you please show different editors? And label which editors you show?

simnalamburt commented 4 months ago









I'll upload windows examples tomorrow since I cannot access my windows devices for right now.

xnuk commented 4 months ago

GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.

And many editors try to vertically align a CJK character with 2 latin alphabets, regardless of font configs. Terminal-based editors already do this - every Unicode-aware terminal emulators try to treat a CJK character width as 2 latin chars by default.

Ace, a web-based editor (which is used by Rust Playground), tries to vertically align intentionally. They measures font size and manually put twice size of it, like: <span style="width: 14.3984px;" class="ace_cjk">한</span>.


GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:


Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:

Neo둥근모 D2Coding
image image
xnuk commented 4 months ago

Yes, character width can be different by editors, and by fonts. But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth-like functions can help this.

simnalamburt commented 4 months ago

I've created a patch that addresses this issue as follows:

What do you think? (I didn't know how to make it an opt-in option, so I included it in the default behavior for now)

simnalamburt commented 4 months ago

Similar issue found at 'remark-lint-table-cell-padding'. While code below is compact enough, remark-lint-table-cell-padding reports linter errors.



|Enum Value  |Description   |
|ORDER       |주문 정산     |
|ORDER_CANCEL|주문 취소 정산|
|MANUAL      |수동 정산     |
simnalamburt commented 4 months ago

Similar issue found at 'remark-lint-maximum-line-length'. While code below is obviously crossed 80-width column, remark-lint-maximum-line-length does not report error.



Mercury mercury mercury mercury mercury mercury mercury mercury mercury mercury

한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~

Most formatters like Prettier, rustfmt, ... properly handles fullwidth characters in this situations:

wooorm commented 3 months ago

GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:

Every tool (that is reasonably good) does that, your editor too. There are many Unicode characters. Fonts don’t have glyphs for all Unicode characters. So they use a “font stack”. A preferred font renders a glyph, if not found, then the next font is used, and so forth.

Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:

Right, so this comes back to my original point. How things show, depends on the final user’s computer, and what that user chooses. Of course, CJK users have configured their editor to display CJK well. Then, some other user comes in, who uses GitHub, and there the alignment is different.

Different users will see things differently. It is not known to remark-lint how things will display.

But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth-like functions can help this.

That is a strong statements while the topic remains vague.

What is “Unicode-aware”? Please explain or link to a specification or algorithm. What about Emoji? What about ANSI colors? Control characters? What do you want to do about ambiguous characters? Which Unicode version do you want to go with?

Why should remark-lint align differently than GitHub?

wcwidth is indeed interesting. There is also, which is a JavaScript version, but not maintained. I have also used string-width in the past.

We do support configurable functions in remark-gfm: I am open to such functionality of course. But it needs to work well.

And, could there be a good default that works for every user?