sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.46k stars 2.1k forks source link

MultiByte filename handling #703

Closed shimizukawa closed 9 years ago

shimizukawa commented 9 years ago

Current version of Sphinx handle filename as Unicode, but few implementation lines breaks that cause UnicodeEncodeError / UnicodeDecodeError exception.

(forked repository's changeset: https://bitbucket.org/shimizukawa/sphinx/changeset/40c9407c28e0 )


shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2011-06-30 00:43:15+00:00

fix again: new patch for multibyte filename handling.

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2011-06-30 02:50:03+00:00

fix again again..: please check only sphinx-1.0.7-multibyte-filename3.diff.

shimizukawa commented 9 years ago

From Georg Brandl on 2011-09-23 08:18:32+00:00

Hmm, why doesn't docutils like unicode as its source? Can you investigate?

In any case, to get rid of all (?) of these ugly Unicode issues with path names, best use Sphinx 1.1 on Python 3.

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2012-11-05 12:16:38+00:00

Issue #1016 was marked as a duplicate of this issue.

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2012-11-05 12:18:51+00:00

I updated implementation that is here: PR61.

shimizukawa commented 9 years ago

From Toshio Kuratomi on 2013-04-12 21:48:54+00:00

Just tried to build sphinx-1.2b1 where PR61 was merged. This patch causes breakage when the filenames are undecodable (or unencodable) in the current locale.

For a simple method to detect this on a Linux system:

$ cd Sphinx-1.2b1/tests $ LC_ALL=C ./run.py -x

Running Sphinx test suite...

....................E

ERROR: test_build.test_multibyte_path

Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(_self.arg) File "/srv/git/python-sphinx/Sphinx-1.2b1/tests/util.py", line 199, in deco func(app, args2, _kwargs2) File "/srv/git/python-sphinx/Sphinx-1.2b1/tests/test_build.py", line 79, in test_multibyte_path (srcdir / mb_name).makedirs() File "/srv/git/python-sphinx/Sphinx-1.2b1/tests/path.py", line 196, in makedirs b_filename = self.encode(FILESYSTEMENCODING) UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-26: ordinal not in range(128)


Ran 21 tests in 11.094s

FAILED (errors=1)

Do you want a new issue or shall we just continue in this one?

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2013-04-13 05:17:46+00:00

Toshio Kuratomi Thanks for reporting.

Do you want a new issue or shall we just continue in this one?

Let's continue here!

I think the Sphinx need a new conf.py variable for specifying the file system encoding.

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2013-09-17 15:17:59+00:00

Because of Sphinx need file system encoding to load conf.py, encoding variable in the conf.py is not useful.

In my current opinion, if file names are undecodable in the current locale, display error message "Sphinx need LC_CTYPE environment variable to recognize multibyte filenames" and stop building.

shimizukawa commented 9 years ago

From Nozomu Kaneko on 2013-09-17 16:55:13+00:00

+1

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2013-09-28 12:54:32+00:00

When Sphinx couldn't decode multibyte filename, now Sphinx notices UnicodeError and continuing if possible instead of raise exception. Closes #703

→ <<cset 58be2f19b3aeeabf4d9a3d10be84824508c0fda2>>

shimizukawa commented 9 years ago

From Arfrever Frehtes Taifersar Arahesis on 2013-09-28 15:50:25+00:00

"multibyte filename did not support on this filesystem encoding" etc. is grammatically wrong.

At least please use present tense and passive voice (instead of past tense and active voice):

"multibyte filenames are not supported on this filesystem encoding"

"multibyte console input is not supported on this encoding"

Also, "abc" is technically a multibyte filename (3 bytes).

Decoding of a filename with a singular, undecodable byte also raises exception:

$ python2.7 -c 'from os import path; path.join(u"abc", "\x80")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 1: ordinal not in range(128)

Maybe "filenames with non-ASCII code points are not supported on this filesystem encoding" etc. would be better.

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2013-10-03 06:46:22+00:00

grammar fix: 'multibyte filename' is ambiguous. They are replaced with 'non-ASCII filename'. refs #703

→ <>

shimizukawa commented 9 years ago

From Takayuki Shimizukawa on 2013-10-03 06:47:00+00:00

Arfrever Frehtes Taifersar Arahesis Thanks for reviewing!