spyder-ide / spyder-docs

Documentation for Spyder, the Scientific Python Development Environment
https://docs.spyder-ide.org
MIT License
32 stars 274 forks source link

Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal #332

Closed JulienPalard closed 1 year ago

JulienPalard commented 1 year ago

This is literally the smallest PR I've ever done.

It removes a zero width no-break space.

But this char was breaking the inline literal next to it, see in this page, the ``is_dark_font_color`` should have been interpreted by Sphinx and rendered in red:

Capture d’écran du 2022-10-05 22-27-06

The removed character is obviously not rendered in github "files changed" interface. Not in git diff, and git show --color-words either. Not in your editor, and not in your terminal, ... The character is a space. And a space with no width!!!

If you really want to see it, a git show | cat -A can be helpfull, you'll see something like:

-in the ``mainwindow.py`` file we import the M-oM-;M-?``is_dark_font_color``
+in the ``mainwindow.py`` file we import the ``is_dark_font_color``

But the paragraph is way longer than that so it's a bit hard to spot.

For the curious the M-... notation denotes bytes in the range [128;255]. The 32 first of this range are then treated as if they were in the range [0; 32] and displayed using the ^ notation, so \x80 is M-^@, and the other ones are just substracted by 128, so \xa0 is M- (yes a space).

So M-o is \x6f + 128 (\x6f is the value for o in the ASCII table) = \xef. M-; is \xbb and M-? is \xbf. Gives us the sequence \xef\xbb\xbf.

Still curious? The file is encoded using UTF-8, so to decode this UTF-8 sequence we need to extract relevant bits from it. In binary it looks like:

11101111 10111011 10111111

The leading 1110 means "There's 3 bytes for this char" (Count the ones, three ones → three bytes. The zero is just a delimiter). The trailing two bytes starts with "10" meaning "we're trailing bytes".

If we drop those markers (1110 and 10 in front of bytes) and keep the remaining bits we're left with 1111111011111111, which evaluates to 65279, which is in hexadecimal0xfeff. Yes, you recognize it, it's a BOM. Because yes a BOM is just a ZERO WIDTH NO-BREAK SPACE, isn't it beautiful?

Do we really have to do the bit manipulation to discover what this character was? Obviously not, just use emacs' M-x describe char on it:

             position: 4646 of 14699 (32%), column: 380
            character:  (displayed as ) (codepoint 65279, #o177377, #xfeff)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0xFEFF
               script: arabic
               syntax: w    which means: word
             to input: type "C-x 8 RET feff" or "C-x 8 RET ZERO WIDTH NO-BREAK SPACE"
          buffer code: #xEF #xBB #xBF
            file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix)
              display: by this font (glyph code):
    ftcrhb:-GOOG-Noto Naskh Arabic UI-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x5D5)

Character code properties: customize what to show
  name: ZERO WIDTH NO-BREAK SPACE
  old-name: BYTE ORDER MARK
  general-category: Cf (Other, Format)
  decomposition: (65279) ('')

And this is literally the longest PR description I've written.

aprofeit commented 1 year ago

Is this a one off? Or could there be others. Look how many PR words you had to write to fix this one!

JulienPalard commented 1 year ago

Is this a one off? Or could there be others. Look how many PR words you had to write to fix this one!

I discovered it while working on a new test on sphinxlint, and according to sphinx-lint this is the only one.

Just for good measure, I just tried git grep $'\xef\xbb\xbf', and don't see any other.

OJFord commented 1 year ago

The removed character is obviously not rendered in github "files changed" interface.

For whatever idle curiosity it's worth, it is for me, in the GitHub Android app:

Screenshot_20221005-232106~2.png

i.e. obviously not in terms of a source character or anything with width, but it does similarly affect Github's own rendering.

MFogleman commented 1 year ago

Any idea what introduced the zero width whitespace?

CAM-Gerlach commented 1 year ago

Any idea what introduced the zero width whitespace?

It was commit 23a4f28 in PR #81 ; it was copied from a Google doc but I've verified it wasn't in the Google Doc script text, and there wasn't any weird formatting anywhere near that location. While I'm really not sure what caused this, I believe most likely scenario is due to the non-US keyboard layout that I believe the author had, one in which ` is not present on the keyboard and must instead be typed via a special escape sequence, it may have been accidentally mistyped when trying to type a ` instead (since that was done between the Google Doc script and here).

Also, I couldn't find any other non-ASCII characters used throughout the docs, except for those that were intended, so this was indeed a one-off.

krmbzds commented 1 year ago

PSA for Mac Users

+ space = non-breaking space

Mitigation

  1. Install Karabiner
  2. Import Disable alt+spacebar (nonbreaking space) rule

Long-term Solution

P.S.

Excuse me for dropping this here. Any soul we can save might potentially benefit humanity (or prevent some catastrophe) in the future.

JulienPalard commented 1 year ago

Beware, non-breaking space is not zero width non-breaking space.

(And non-breaking space is usefull, in french at least, because we put them before ?, ! and so on: we want them to be spaced, but not cause a newline, it's ugly to have the ? at the start of a line, far away from the last word of the question.)

krmbzds commented 1 year ago

Beware, non-breaking space is not zero width non-breaking space.

(And non-breaking space is usefull, in french at least, because we put them before ?, ! and so on: we want them to be spaced, but not cause a newline, it's ugly to have the ? at the start of a line, far away from the last word of the question.)

@JulienPalard Thanks! I did not know what non-breaking space was useful for (and I initially confused ZWNBSP with NBSP).

Disembaudio commented 1 year ago

Hopefully this works here, but I have this little invisible guy. >>   <<

CAM-Gerlach commented 1 year ago

It's not quite invisible...especially in a monospace font :)

Disembaudio commented 1 year ago

It's not quite invisible...especially in a monospace font :)

It's imperceptible. Ha. I think it's part of a flag emoji. 🇪🇲 All I know is a few of these will render my Gboard invisible – [ ٹ  ̣   ̴̴ ]