python / cpython

The Python programming language
https://www.python.org
Other
62.91k stars 30.13k forks source link

Remove sys.setfilesystemencoding() #53841

Closed vstinner closed 14 years ago

vstinner commented 14 years ago
BPO 9632
Nosy @malemburg, @pitrou, @vstinner, @merwok
Files
  • remove_sys_setfilesystemencoding-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library', 'expert-unicode'] title = 'Remove sys.setfilesystemencoding()' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'lemburg' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)', 'Unicode'] creation = creator = 'vstinner' dependencies = [] files = ['18576'] hgrepos = [] issue_num = 9632 keywords = ['patch'] message_count = 14.0 messages = ['114211', '114342', '114409', '114855', '114856', '115024', '115105', '115127', '115547', '115821', '115822', '115854', '116047', '116089'] nosy_count = 5.0 nosy_names = ['lemburg', 'pitrou', 'vstinner', 'eric.araujo', 'Arfrever'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue9632' versions = ['Python 3.2'] ```

    vstinner commented 14 years ago

    sys.setfilesystemencoding() function is dangerous because it introduces a lot of inconsistencies: this function is unable to reencode all filenames in all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError.

    As sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() is the root of evil :-) PYTHONFSENCODING (issue bpo-8622) is the right solution to set the filesysteme encoding.

    Attached patch removes sys.setfilesystemencoding().

    vstinner commented 14 years ago

    New version of the patch: remove also _Py_SetFileSystemEncoding().

    malemburg commented 14 years ago

    While you're right that adjusting the FS encoding long after Python has already started is probably not such a good idea, I do think that we need to provide a way to set the FS encoding from within Python without having to rely on external settings.

    Think of e.g. embedded Python interpreters or py2exe-style applications running on Linux or other systems that don't use Unicode APIs for FS-interaction or have fixed FS-encodings.

    vstinner commented 14 years ago

    Think of e.g. embedded Python interpreters or py2exe-style applications running on Linux or other systems that don't use Unicode APIs for FS-interaction or have fixed FS-encodings.

    What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?

    Sorry, but I do not understand.

    vstinner commented 14 years ago

    About the patch: it should patch "Filenames and unicode" section of Doc/whatsnew/3.2.rst (to explain that sys.setfilesystemencoding() is replaced by the PYTHONFSENCODING env var).

    malemburg commented 14 years ago

    STINNER Victor wrote:

    STINNER Victor \victor.stinner@haypocalc.com\ added the comment:

    > Think of e.g. embedded Python interpreters or py2exe-style applications > running on Linux or other systems that don't use Unicode APIs > for FS-interaction or have fixed FS-encodings.

    What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?

    Sorry, but I do not understand.

    In such environments you cannot expect the user to configure the system properly (i.e. set an environment variable). Instead, the application has to provide an educated guess to the Python interpreter in some way, hence the idea to use a configuration file or perhaps provide a C API that can be used to set the variable before initializing the interpreter.

    pitrou commented 14 years ago

    >> Think of e.g. embedded Python interpreters or py2exe-style applications >> running on Linux or other systems that don't use Unicode APIs >> for FS-interaction or have fixed FS-encodings. > > What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path? > > Sorry, but I do not understand.

    In such environments you cannot expect the user to configure the system properly (i.e. set an environment variable). Instead, the application has to provide an educated guess to the Python interpreter in some way, hence the idea to use a configuration file or perhaps provide a C API that can be used to set the variable before initializing the interpreter.

    Why wouldn't the embedding application just set the environment var before initializing the Python interpreter?

    malemburg commented 14 years ago
    Antoine Pitrou wrote:
    > 
    > Antoine Pitrou <pitrou@free.fr> added the comment:
    > 
    >>>> Think of e.g. embedded Python interpreters or py2exe-style applications
    >>>> running on Linux or other systems that don't use Unicode APIs 
    >>>> for FS-interaction or have fixed FS-encodings.
    >>>
    >>> What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?
    >>>
    >>> Sorry, but I do not understand.
    >>
    >> In such environments you cannot expect the user to configure the
    >> system properly (i.e. set an environment variable). Instead, the
    >> application has to provide an educated guess to the Python
    >> interpreter in some way, hence the idea to use a configuration
    >> file or perhaps provide a C API that can be used to set the
    >> variable before initializing the interpreter.
    > 
    > Why wouldn't the embedding application just set the environment var
    > before initializing the Python interpreter?

    Because that's not easy to do in a platform independent way. OTOH, it's very easy to do via a C API function in Python and since this env var is essential for the operation of Python, adding such an API is warranted.

    vstinner commented 14 years ago

    In such environments you cannot expect the user to configure the system properly (i.e. set an environment variable).

    Why would it be different for embeded python?

    Instead, the application has to provide an educated guess to the Python interpreter in some way, ...

    How can the application guess the encoding better than Python? If the user doesn't configure correctly its environment, I don't see how the application can get the real (correct) environment config?!

    If Python is unable to start because of the filesystem encoding, it is a bug (see bpo-8611). If Python starts but displays incorrectly filenames, it is the user fault: the user have to setup its environment.

    vstinner commented 14 years ago

    About "embedded Python interpreters or py2exe-style applications": do you mean that the application calls a C function to set the encoding before starting the interpreter? Or you mean the Python function, sys.setfilesystemencoding()?

    I would like to remove the Python function just because it doesn't work (it doesn't reencode filenames from all Python objects). But we might keep the C function if you really want to :-)

    vstinner commented 14 years ago

    "keep the C function"

    Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx().

    I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.).

    malemburg commented 14 years ago

    STINNER Victor wrote:

    STINNER Victor \victor.stinner@haypocalc.com\ added the comment:

    "keep the C function"

    Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx().

    I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.).

    If you embed Python into another application, say as scripting language for that application, that other application may have completely different requirements for the user setup than Python expects, e.g. for a Windows GUI application it's not feasible to ask the user to change the environment variables via the registry in order for Python to pick up the right encoding information.

    What we'd need is a way for the embedding application to provide this information in a way that doesn't require setting up the environment in some special way. The application will likely have its own way of configuring things like file system or I/O stream encodings. Think of e.g. GTK or Qt applications as example.

    The Py_InitializeEx() function sounds like a good idea to pass the information about such important extra parameters to Python. This could take arguments for setting the file system encoding as well as the I/O encoding. The arguments would then override the env var settings.

    So you can remove the function, but have to keep a backdoor open for use cases like the one I described above.

    The Py_InitializeEx() function approach would also avoid all the issues that you have with calling _Py_SetFileSystemEncoding() after the interpreter has been initialized.

    vstinner commented 14 years ago

    I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case.

    If you embed Python into another application, say as scripting language for that application, that other application may have completely different requirements for the user setup than Python expects, e.g. for a Windows GUI application it's not feasible to ask the user to change the environment variables via the registry in order for Python to pick up the right encoding information.

    Is this usecase really realistic? Except you, nobody asked for this feature.

    The application will likely have its own way of configuring things like file system or I/O stream encodings. Think of e.g. GTK or Qt applications as example.

    Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp.

    Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c.

    Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue.

    --

    I commited my patch to 3.2 (r84687).

    malemburg commented 14 years ago

    STINNER Victor wrote:

    STINNER Victor \victor.stinner@haypocalc.com\ added the comment:

    I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case.

    Yes, it would be a new function. I was under the impression that you wanted to use this approach to resolve the problem of not being able to set the encoding before any file objects get opened in Python.

    > If you embed Python into another application, say as scripting > language for that application, that other application may have > completely different requirements for the user setup than Python > expects, e.g. for a Windows GUI application it's not feasible to > ask the user to change the environment variables via the registry > in order for Python to pick up the right encoding information.

    Is this usecase really realistic? Except you, nobody asked for this feature.

    That's more likely due to the fact that no one is embedding Python 3.x into their apps yet...

    > The application will likely have its own way > of configuring things like file system or I/O stream encodings. > Think of e.g. GTK or Qt applications as example.

    Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp.

    Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c.

    That's not the point: the applications will have their own way of configuring themselves and in GUI apps you most likely do not use environment variable to setup your application. As a result, the application has to tell the embedded Python how it was configured in a way that overrides Python's encoding finding magic.

    With your patch, the only way to do this is by having the embedded application change the OS environment. That's not exactly a very Pythonic way of doing interfacing.

    Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue.

    I commited my patch to 3.2 (r84687).

    Since you are removing a function that has been around since 3.0, please make sure that you add proper warnings to 3.1.