pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.46k stars 3k forks source link

No support for non-latin1 characters in credentials #5801

Open papparotzi opened 5 years ago

papparotzi commented 5 years ago

Environment

Description

When attempting to install or even update anything with pip I get the old UnicodeEncodeError Expected behavior

To work. How to Reproduce

  1. Run any pip install/update command
  2. An error occurs.

Output

D:\Applications\Python36\Scripts>pip install pyodbc --log "C:\Users\kfrazi1\Downloads\log.txt" --trusted-host pypi.org -
-trusted-host files.pythonhosted.org
Collecting pyodbc
Exception:
Traceback (most recent call last):
  File "d:\applications\python36\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "d:\applications\python36\lib\site-packages\pip\commands\install.py", line 324, in run
    requirement_set.prepare_files(finder)
  File "d:\applications\python36\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "d:\applications\python36\lib\site-packages\pip\req\req_set.py", line 554, in _prepare_file
    require_hashes
  File "d:\applications\python36\lib\site-packages\pip\req\req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "d:\applications\python36\lib\site-packages\pip\index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "d:\applications\python36\lib\site-packages\pip\index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "d:\applications\python36\lib\site-packages\pip\index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "d:\applications\python36\lib\site-packages\pip\index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "d:\applications\python36\lib\site-packages\pip\index.py", line 792, in get_page
    "Cache-Control": "max-age=600",
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "d:\applications\python36\lib\site-packages\pip\download.py", line 386, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\cachecontrol\adapter.py", line 47, in send
    resp = super(CacheControlAdapter, self).send(request, **kw)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\adapters.py", line 405, in send
    conn = self.get_connection(request.url, proxies)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\adapters.py", line 303, in get_connection
    proxy_manager = self.proxy_manager_for(proxy)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\adapters.py", line 190, in proxy_manager_for
    proxy_headers = self.proxy_headers(proxy)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\adapters.py", line 384, in proxy_headers
    password)
  File "d:\applications\python36\lib\site-packages\pip\_vendor\requests\auth.py", line 63, in _basic_auth_str
    password = password.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 2: ordinal not in range(256)

My password does have a special character in it, which is a requirement of the business. I have environment variables for http/https proxy, I have also tried installing while passing the proxy in the command itself and get the same output. I have tried changing the charset in my command window which did not work. There are no spaces in my top level Python folder as one can see from the output. Everything I and finding on the net is regarding charset or outdated software versions. Any help you can give would be appreciated. Thank you.

pfmoore commented 5 years ago

It looks like requests is insisting on encoding the password in Latin-1. I don't know if Latin-1 is needed as part of the relevant protocol, but whether it is or not, this looks like a requests issue. I suggest you raise it with them - it might help them diagnose the issue if you can reproduce the problem with "plain" requests rather than via pip, but if not, there should still be enough in this traceback for them.

papparotzi commented 5 years ago

Thanks for your response, I will create an account and raise an issue with them.

cjerdonek commented 5 years ago

It looks like requests might require a byte string to be passed in this case, instead of Unicode: https://github.com/requests/requests/issues/3662

chrahunt commented 5 years ago

It is probably safe to assume that the encoding of credential fields should be UTF-8 as:

  1. Browsers use it (source)
  2. When servers want to request a specific charset, their only option is UTF-8. (source)

I think it would be OK for us to always extract and encode these fields before passing them to requests.

CarliJoy commented 4 years ago

I tested the change with our corporate proxy and it works now with UTF8 passwords. Please accept PR

CarliJoy commented 4 years ago

This issue is an vendor issue and depends on: https://github.com/psf/requests/issues/4564 Only after this issues fixed and released, the pip issue can be closed due to updating the vendored requests libary (see https://pip.pypa.io/en/latest/development/vendoring-policy/)

chrahunt commented 4 years ago

Hi @CarliJoy, we should be able to do this in pip itself as long as we do the encoding before passing the username and password to requests. This is the workaround described in psf/requests#4564.