rstcheck / rstcheck-core

Core library behind rstcheck.
http://rstcheck-core.rtfd.io/
MIT License
4 stars 8 forks source link

Title overline too short with accented characters (windows only) #6

Open lgarcin opened 4 years ago

lgarcin commented 4 years ago

Environment

Expected behavior

The following code should not trigger a warning.

====================
Chaînes de caractère
====================

Actual behavior

The linter triggers "Title overline too short" warning.

When I add an equal sign for each accented character, the warning is no more triggered.

======================
Chaînes de caractère
======================
myint commented 4 years ago

I'm guessing this would be an upstream issue in docutils. Can you check? Thanks!

lcnittl commented 4 years ago

This is actually a problem with all kinds of unicode/non-ascii characters. Eg

A∞B
===

Infinity sign in heading.

A—B
---

em-dash in heading (en-dash also yields warning)

The files do, however, parse without problems in sphinx. (Same with tables and special chars – no problem for sphinx).

Is it possible that this gets transformed to bytes at some point and is then longer on special chars that take 2 bytes or more?

lhalda-pcv commented 3 years ago

I have the same problem as @lgarcin - "(WARNING/2) Title overline too short." linter warning in VS Code (coming from lextudio.restructuredtext extension that uses rstcheck for linting) when the overline is okay and parses well. For me it is the en-dash that breaks it; I suspect as @lcnittl does that the subsequent byte in UTF-8 is parsed wrongly as another character. I would appreciate if this could be fixed to count non-ASCII characters properly.

Also on Windows 10, with Python 3.8.10 64bit and rstcheck 3.3.1

Cielquan commented 2 years ago

I cannot reproduce this issue with any of the above examples with rstcheck 4.0.

But I am on linux. Tested with py3.8.10.

Can you confirm the issue is gone?

lcnittl commented 2 years ago

I cannot reproduce this issue with any of the above examples with rstcheck 4.0.

But I am on linux. Tested with py3.8.10.

Can you confirm the issue is gone?

Tested with v4.0.0: Can confirm that it is no problem on Linux (tested on Ubuntu 20.04) - but remains a problem on Windows (Windows 11), unfortunately.

Cielquan commented 2 years ago

I'm guessing this would be an upstream issue in docutils. Can you check? Thanks!

I strongly think that this is an upstream issue with docutils. To my understanding rstcheck only passes the code to docutils which in turn uses the code and returns the errors.

I think the issue lies here: docutils.utils.__init__:column_width

Cielquan commented 2 years ago

I cannot reproduce this issue with any of the above examples with rstcheck 4.0. But I am on linux. Tested with py3.8.10. Can you confirm the issue is gone?

Tested with v4.0.0: Can confirm that it is no problem on Linux (tested on Ubuntu 20.04) - but remains a problem on Windows (Windows 11), unfortunately.

Thanks for the confirmation.

Cielquan commented 2 years ago

After some more looking and thinking I am sure this is an upstream issue with docutils we can do nothing about.

Because I do not have an windows system at hand to test this I can only assume that the issue lies at the aforementioned place:

docutils.utils.__init__:column_width

I think this simple change could prove my assumption:

def column_width(text):
    """Return the column width of text.

    Correct ``len(text)`` for wide East Asian and combining Unicode chars.
    """
+        return len(text)
    if isinstance(text, str) and sys.version_info < (3, 0):
        return len(text)
    width = sum([east_asian_widths[unicodedata.east_asian_width(c)]
                 for c in text])
    # correction for combining chars:
    width -= len(find_combining_chars(text))
    return width

Therefore I am closing this issue. If you think there is something to be done here please reopen this issue.

If there is a bug report over at docutils it would be nice if someone could link it here then.

Cielquan commented 2 years ago

I reopen this issue for tracking.

lhalda-pcv commented 1 year ago

Two years later, this issue was still bugging me. Did a bit more googling, adding PYTHONUTF8 = 1 environment variable to my Windows environment variables fixed it (Windows 11 22H2, Python 3.8.10).

Sources: https://github.com/adrienverge/yamllint/issues/530 https://dev.to/methane/python-use-utf-8-mode-on-windows-212i