Open simnalamburt opened 4 months ago
Hey!
This rule measures characters. The size of displayed characters is different in different places. Characters will be different in a terminal, different here on GitHub, different in your editor, different in my editor.
There isn’t a good solution. I recommend turning this rule off instead. See https://github.com/remarkjs/remark-lint/tree/main/packages/remark-lint-table-pipe-alignment#fix for more info
Majority of programmers codes with fixed sized font and width of CJK characters in such situation are well defined and documented in here: https://unicode.org/Public/UNIDATA/EastAsianWidth.txt
We at least need to create a new option to respect the size of full width character. Something like "fixed width font mode"
Have you tested your ideas in different code editors?
I'm Korean. Yes I have been used CJK letters more than 20 years in many editors including vi, nvim, Vim, nano, micro, ed, terminal, emacs, VSCode, atom, zed, notepad.exe, notepad++, Visual Studio, sublime. In these editors, CJK letters will be displayed as fullwidth character (which have double width of halfwidth character) unless you explicitly change your font into something other than fixed width font.
Just in case you might not familiar with CJK: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms
I’m looking at your code example here on GH and they’re both displayed “misaligned”
I don't think it's a good idea to ignore this issue to satisfy GitHub's misrepresentation of CJK characters.
I at least want to solve this problem by creating a linter option that respects the size of full-width characters. Since it's an option, it shouldn't affect existing behavior. I strongly believe that this should be the default behavior, not an option to opt-in, and this is what any CJK developer would have done. But if you disagree, I'd be happy to see it fixed as an option. What do you think?
GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.
My point is that all tools are going to display it differently. What is the point in having an option to this rule which will look fine for some users and broken for other users.
That's what the rule currently does: broken for some, good for some. So I suggest not using this rule.
You say that all tools behave one way. I show GH doing it differently. I know editors display emoji differently. Can you provide screenshots for different tools displaying your examples correctly?
This it it
Emoji containing version if you need:
"remark-lint-table-pipe-alignment" have been existed for a long time because it is useful to some users even if it's fine for some non-CJK users and broken for CJK users.
And so the new fullwidth-respecting option will be the same. It will be pretty much useful for CJK users.
Everyone uses emoji. Not just Koreans.
different tools
Can you please show different editors? And label which editors you show?
neovim:
iTerm2:
VSCode:
zed:
I'll upload windows examples tomorrow since I cannot access my windows devices for right now.
GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.
And many editors try to vertically align a CJK character with 2 latin alphabets, regardless of font configs. Terminal-based editors already do this - every Unicode-aware terminal emulators try to treat a CJK character width as 2 latin chars by default.
Ace, a web-based editor (which is used by Rust Playground), tries to vertically align intentionally. They measures font size and manually put twice size of it, like: <span style="width: 14.3984px;" class="ace_cjk">한</span>
.
GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:
Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:
Neo둥근모 | D2Coding |
---|---|
Yes, character width can be different by editors, and by fonts. But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth
-like functions can help this.
I've created a patch that addresses this issue as follows: https://github.com/remarkjs/remark-lint/commit/1e659c27c530c345f2a2c592b572ee15dd5ba62c
What do you think? (I didn't know how to make it an opt-in option, so I included it in the default behavior for now)
Similar issue found at 'remark-lint-table-cell-padding'. While code below is compact enough, remark-lint-table-cell-padding reports linter errors.
Plaintext:
|Enum Value |Description |
|------------|--------------|
|ORDER |주문 정산 |
|ORDER_CANCEL|주문 취소 정산|
|MANUAL |수동 정산 |
Similar issue found at 'remark-lint-maximum-line-length'. While code below is obviously crossed 80-width column, remark-lint-maximum-line-length does not report error.
Plaintext:
Mercury mercury mercury mercury mercury mercury mercury mercury mercury mercury
mercury.
한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~
Most formatters like Prettier, rustfmt, ... properly handles fullwidth characters in this situations:
GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:
Every tool (that is reasonably good) does that, your editor too. There are many Unicode characters. Fonts don’t have glyphs for all Unicode characters. So they use a “font stack”. A preferred font renders a glyph, if not found, then the next font is used, and so forth.
Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:
Right, so this comes back to my original point. How things show, depends on the final user’s computer, and what that user chooses. Of course, CJK users have configured their editor to display CJK well. Then, some other user comes in, who uses GitHub, and there the alignment is different.
Different users will see things differently. It is not known to remark-lint
how things will display.
But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth-like functions can help this.
That is a strong statements while the topic remains vague.
What is “Unicode-aware”? Please explain or link to a specification or algorithm. What about Emoji? What about ANSI colors? Control characters? What do you want to do about ambiguous characters? Which Unicode version do you want to go with?
Why should remark-lint
align differently than GitHub?
wcwidth
is indeed interesting. There is also https://www.npmjs.com/package/wcwidth, which is a JavaScript version, but not maintained. I have also used string-width
in the past.
We do support configurable functions in remark-gfm
: https://github.com/remarkjs/remark-gfm#example-stringlength.
I am open to such functionality of course.
But it needs to work well.
And, could there be a good default that works for every user?
Initial checklist
Affected packages and versions
remark-lint-table-pipe-alignment@3.1.3
Link to runnable example
No response
Steps to reproduce
Open below with browser:
Expected behavior
The first one should be fine and the second one should be an error.
Actual behavior
However, the first one gives an error and the second one does not.
Runtime
Node v17, Node v16, Node v14, Node v12, Deno, Electron, Other (please specify in steps to reproduce)
Package manager
npm 8, npm 7, npm 6, yarn 2, yarn 1, pnpm, Other (please specify in steps to reproduce)
OS
Windows, Linux, macOS, Other (please specify in steps to reproduce)
Build and bundle tools
Webpack, Rollup, esbuild, Parcel, Create React App, Gatsby, Next.js, Remix, Docusaurus, Snowpack, Vite, Other (please specify in steps to reproduce)