python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

bdist_wininst depends on MBCS codec, unavailable on non-Windows #55154

Closed merwok closed 3 years ago

merwok commented 13 years ago
BPO 10945
Nosy @loewis, @amauryfa, @vstinner, @tarekziade, @merwok, @jelmer, @hroncok, @miss-islington
PRs
  • python/cpython#14506
  • python/cpython#14509
  • python/cpython#14510
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/merwok' closed_at = created_at = labels = ['type-bug', 'library'] title = 'bdist_wininst depends on MBCS codec, unavailable on non-Windows' updated_at = user = 'https://github.com/merwok' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'eric.araujo' closed = True closed_date = closer = 'vstinner' components = ['Distutils', 'Distutils2'] creation = creator = 'eric.araujo' dependencies = [] files = [] hgrepos = [] issue_num = 10945 keywords = ['patch'] message_count = 27.0 messages = ['126528', '126531', '126533', '126536', '126539', '135892', '135894', '139968', '139974', '145213', '145509', '145540', '145548', '145627', '145630', '145680', '155591', '155603', '346998', '346999', '347001', '347009', '347011', '347013', '347015', '347132', '384814'] nosy_count = 12.0 nosy_names = ['loewis', 'amaury.forgeotdarc', 'vstinner', 'techtonik', 'tarek', 'eric.araujo', 'Arfrever', 'jelmer', 'runtux', 'hroncok', 'Jelmer Vernooij', 'miss-islington'] pr_nums = ['14506', '14509', '14510'] priority = 'normal' resolution = 'wont fix' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue10945' versions = ['3rd party', 'Python 2.7', 'Python 3.2', 'Python 3.3'] ```

    merwok commented 13 years ago

    If distutils.commands.bdist_wininst.bdist_wininst.get_inidata returns a unicode string (which is always the case in 3.x and can happen in 2.x if if you pass a unicode object as one setup argument), bdist_wininst will try to use the MBCS codec, which is not available on non-Windows OSes (see http://docs.python.org/dev/library/codecs#module-encodings.mbcs).

    If someone wants to make a patch, here are some guidelines: http://wiki.python.org/moin/Distutils/FixingBugs

    9432cd65-9f88-4bc6-8ad6-72e5ab1c730d commented 13 years ago

    I believe it is better to start and commit a testcase .

    merwok commented 13 years ago

    We don’t commit failing tests :) If you mean you want to write a test case, please go ahead and attach a patch or changeset URI.

    9432cd65-9f88-4bc6-8ad6-72e5ab1c730d commented 13 years ago

    Please don't ask me about patches. Core devs won't accept them.

    merwok commented 13 years ago

    Please don't ask me about patches. Core devs won't accept them.

    One of your patches for diff has been accepted; some distutils bugs have patches from you (for which we are grateful) that are in various review stages. The only thing preventing acceptance where I’m concerned is time, nothing else.

    045f8f0d-9bfc-4a14-9319-6b326ddadb21 commented 13 years ago

    I've just been bitten by this when trying to do a new release of roundup, why not make the mbcs codec available on non-windows platforms as has been proposed (and rejected) in bpo-1251921 -- any non-technical reasons for not including this codec on other platforms?

    amauryfa commented 13 years ago

    The mbcs codec depends on the Windows installation. On most Western Windows it is similar to cp1252, Japanese Windows will use cp932, and so on. If we were to provide mbcs on non-windows platform, it should be an alias to ascii.

    vstinner commented 13 years ago

    Can't you only work with Unicode and avoid the MBCS encoding?

    045f8f0d-9bfc-4a14-9319-6b326ddadb21 commented 13 years ago

    On Thu, Jul 07, 2011 at 10:52:51AM +0000, STINNER Victor wrote:

    Can't you only work with Unicode and avoid the MBCS encoding?

    I'm trying to build a windows binary package on Linux. This usually works fine -- as long as the package description doesn't contain unicode. If it *contains* unicode it will somehow internally use MBCS encoding. So if someone fixes windows binary package building on non-windows platforms to not use MBCS encoding this is fine with me. But maybe the easier way out here is to include that codec on all platforms.

    (and don't tell me I have to install windows for building a windows binary package :-)

    Thanks, Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: http://www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com osAlliance member email: rsc@osalliance.com

    merwok commented 13 years ago

    haypo:

    Can't you only work with Unicode and avoid the MBCS encoding? It is not code under the users’ control (i.e. setup.py) that uses MBCS, but the bdist_wininst command itself. It used to be runnable from linux, which is great to provide binary installers for Windows users who typically don’t have a compiler. This is what the bug is about.

    vstinner commented 13 years ago

    It is not code under the users’ control (i.e. setup.py) that uses MBCS, but the bdist_wininst command itself.

    bdist_command append configuration data to a wininst-xxx.exe binary. Where does this file come from? Can we modify wininst-xxx.exe binaries?

    If we can modify the binaries, we can change the format to store the configuration data as UTF-8 instead of the ANSI code page.

    It's surprising and "unsafe" (not portable) to use the ANSI code page for an installer: if you build your installer on a french setup (ANSI=cp1252), the configuration was be interpreted incorrectly on a japanese setup (ANSI=cp932). Example:

    >>> 'Hé ho'.encode('cp1252').decode('cp932')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'cp932' codec can't decode bytes in position 1-2: illegal multibyte sequence

    So if the configuration data (package metadata) contains non-ASCII characters, you will not be able to use your installer on a computer using a different ANSI code page than the code page of the computer used to build the installer... In the best case, you will just get mojibake.

    If we cannot modify wininst-xx.exe, an alternative to be able to generate installers on non-Windows platforms is to use the most common ANSI code page (cp1252?), or maybe ASCII.

    Use the ASCII encoding is the safest solution because you will be able to use your installer on all Windows setup (all ANSI code pages are compatible with ASCII), but you will not be able to generate an installer if the package metadata contains at least one non-ASCII character...

    merwok commented 13 years ago

    > It is not code under the users’ control (i.e. setup.py) > that uses MBCS, but the bdist_wininst command itself. bdist_command append configuration data to a wininst-xxx.exe binary. Are you sure? The string that’s encoded with mbcs comes from the get_inidata method in bdist_wininst.py; get_inidata only works with strings (either string literals or str objects (in 3.x) given as attributes of the command object or in the DistributionMetadata object).

    Where does this file come from? Can we modify wininst-xxx.exe binaries? If needed, we can: the code lives in PC/bdist_wininst.

    Use the ASCII encoding is the safest solution I don’t think so. Distutils supports author='Éric', so bdist_wininst should too :)

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 13 years ago

    The problem is that the config file is parsed using GetPrivateProfileString, and the result is then passed to TextOut, SetDlgItemText, CreateWindow, etc. all of which are defined to accept MBCS strings. I agree that this can't work correctly in the general case.

    Changing the GUI functions to operate on Unicode strings is certainly feasible and a good idea. The main challenge then is the format of the INI file. IIUC, GetPrivateProfileStringW is willing to process UTF-16 (with BOM) encoded INI files, but I never tested whether it actually does. If it does, I recommend to have the INI file encoded in UTF-16 (with BOM, using LE for safety).

    In porting wininst.exe, it seems tempting to use the TEXT family of APIs. I'd advise against that, and recommend to explicitly use the *W functions.

    vstinner commented 13 years ago

    The problem is that the config file is parsed using GetPrivateProfileString, and the result is then passed to TextOut, SetDlgItemText, CreateWindow, etc. all of which are defined to accept MBCS strings. I agree that this can't work correctly in the general case.

    These functions are available with an Unicode API:

    GetPrivateProfileStringW TextOutW SetDlgItemTextW CreateWindowW ...

    Changing the GUI functions to operate on Unicode strings is certainly feasible and a good idea. The main challenge then is the format of the INI file. IIUC, GetPrivateProfileStringW is willing to process UTF-16 (with BOM) encoded INI files, but I never tested whether it actually does.

    bdist_wininst creates a long Unicode string for the INI data, and then encode it explicitly:

            if isinstance(cfgdata, str):
                cfgdata = cfgdata.encode("mbcs")

    So I suppose that replacing "mbcs" by "UTF-16-LE" and add the BOM should be enough.

    In porting wininst.exe, it seems tempting to use the TEXT family of APIs. I'd advise against that, and recommend to explicitly use the *W functions.

    Do we need to keep backward compatibility if we change the format of the config data? Or wininst-xx.exe are only usable with trailing config data in the .exe file?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 13 years ago

    Do we need to keep backward compatibility if we change the format of the config data? Or wininst-xx.exe are only usable with trailing config data in the .exe file?

    The wininst.exe belongs to the version of distutils it is released with. So if we change both consistently, no backwards-compatibility problems arise.

    merwok commented 13 years ago

    Would the proposed change mean that a bdist_wininst built with 3.2.0 won’t work with a patched 3.2.3?

    merwok commented 12 years ago

    The proposition of using other C functions and changing the bdist_wininst code looks risky to me, especially as I don’t know how compatibility would be affected (see my previous message). We are free to improve the wininst code in distutils2, or discuss a replacement (Jeremy Kloth was working on something with all the features of MSI and wininst), but for distutils I would very much prefer the simplest fix that could possibly works.

    bdist_msi decodes data read from setup.py with MBCS on Windows; on other OSes, couldn’t the locale preferred encoding be used?

    vstinner commented 12 years ago

    Would the proposed change mean that a bdist_wininst built with 3.2.0 won’t work with a patched 3.2.3?

    The installer doesn't use distutils to read its configuration, so such binary runs with any installed Python version.

    bdist_msi decodes data read from setup.py with MBCS on Windows; on other OSes, couldn’t the locale preferred encoding be used?

    It would be worse: Linux doesn't use Windows code page. Most modern OSes are now using UTF-8 locale encoding, whereas Windows never use UTF-8 as the ANSI code page.

    bc75918c-a209-4fa3-b6cf-28cfb7317f76 commented 5 years ago

    I've opened a PR thet removes the support for bdist_wininst on non-Windows. Apparently, it was broken since the beginning of Py3k anyway. The support can be reintroduced once it is actually fixed (or never).

    https://github.com/python/cpython/pull/14506

    vstinner commented 5 years ago

    I've opened a PR thet removes the support for bdist_wininst on non-Windows. Apparently, it was broken since the beginning of Py3k anyway. The support can be reintroduced once it is actually fixed (or never).

    It would be better to use the Unicode (wide character) flavor of the Windows API to avoid completely any explicitly encoding, and only pass Unicode strings. But This issue is open for 8 years and it seems like nobody cares enough to invest time to implement such change.

    So I'm fine with trivial PR 14506. It doesn't change the status quo: bdist_win was always broken on non-Windows platforms since Python 3.0, but the PR makes the status quo more obvious which is a good thing.

    vstinner commented 5 years ago

    I started a discussion on the Packaging list:

    "Deprecate bdist_wininst" https://discuss.python.org/t/deprecate-bdist-wininst/1929

    vstinner commented 5 years ago

    New changeset 72cd653c4ed7a4f8f8fb06ac364b08a97085a2b5 by Victor Stinner (Miro Hrončok) in branch 'master': bpo-10945: Drop support for bdist_wininst on non-Windows systems (GH-14506) https://github.com/python/cpython/commit/72cd653c4ed7a4f8f8fb06ac364b08a97085a2b5

    miss-islington commented 5 years ago

    New changeset 45c10da40912e04c0d0de02af4b23438ed0de49b by Miss Islington (bot) in branch '3.7': bpo-10945: Drop support for bdist_wininst on non-Windows systems (GH-14506) https://github.com/python/cpython/commit/45c10da40912e04c0d0de02af4b23438ed0de49b

    miss-islington commented 5 years ago

    New changeset be5bb52f5f2d4da4b9d6f42399f7275ab47910f3 by Miss Islington (bot) in branch '3.8': bpo-10945: Drop support for bdist_wininst on non-Windows systems (GH-14506) https://github.com/python/cpython/commit/be5bb52f5f2d4da4b9d6f42399f7275ab47910f3

    vstinner commented 5 years ago

    Follow-up issue: bpo-37468 "Don't install wininst*.exe on non-Windows platforms".

    vstinner commented 5 years ago

    Lib/distutils/tests/test_bdist_wininst.py contains an interesting comment linking to bpo-5731:

            # bpo-5731: command was broken on non-windows platforms
            # this test makes sure it works now for every platform
            # let's create a command

    Related commit:

    commit ad95826c33ea378db2807a268363ccfbf0ea8273 Author: Tarek Ziadé \ziade.tarek@gmail.com\ Date: Thu Apr 9 21:36:44 2009 +0000

    Fixed bpo-5731: Distutils bdist_wininst no longer worked on non-Windows platforms

    But there is also bpo-8954 follow-up which is still open: "wininst regression: errors when building on linux".

    vstinner commented 3 years ago

    The distutils bdist_wininst command has been removed in Python 3.10: see bpo-42802.