pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.52k stars 3.02k forks source link

pip install fails with `ValueError: too many values to unpack` #6165

Closed rcramblit closed 5 years ago

rcramblit commented 5 years ago

Environment

Description

In some situations after installing packages from a requirements file this error is thrown:

Exception:
Traceback (most recent call last):
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 176, in main
    status = self.run(options, args)
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/__init__.py", line 57, in install_given_reqs
    **kwargs
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 919, in install
    use_user_site=use_user_site, pycompile=pycompile,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 445, in move_wheel_files
    warn_script_location=warn_script_location,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/wheel.py", line 583, in move_wheel_files
    outrows = get_csv_rows_for_installed(reader)
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/wheel.py", line 565, in get_csv_rows_for_installed
    for fpath, digest, length in old_csv_rows:
ValueError: too many values to unpack

Expected behavior

Install works How to Reproduce

I am not completely sure yet... I am installing from a requirements file with pre-built packages like: sudo /home/hadoop/venv2/bin/python -m pip install -r /home/hadoop/emr.requirements.txt --find-links=/home/hadoop/packages Pre-built packages are in /home/hadoop/packages

Output

...
...
Installing collected packages: alabaster, SQLAlchemy, MarkupSafe, Mako, python-editor, six, python-dateutil, alembic, chardet, idna, certifi, urllib3, requests, algoliasearch, wrapt, lazy-object-proxy, astroid, avro, colorama, jmespath, docutils, botocore, futures, s3transfer, pyasn1, rsa, PyYAML, awscli, pytz, Babel, backports.functools-lru-cache, beautifulsoup4, Jinja2, numpy, singledispatch, backports-abc, tornado, bokeh, boto3, brewer2mpl, bz2file, cgroupspy, click, configparser, confluent-kafka, coverage, pycparser, cffi, enum34, asn1crypto, ipaddress, cryptography, cycler, dask, scipy, scikit-learn, multipledispatch, cloudpickle, dask-glm, pandas, toolz, dask-searchcv, dask-ml, distance, text-unidecode, Faker, factory-boy, future, fasttext, kiwisolver, subprocess32, pyparsing, matplotlib, fbprophet, pycodestyle, mccabe, pyflakes, flake8, fuzzywuzzy, Geohash, boto, smart-open, gensim, geographiclib, geotext, patsy, statsmodels, ggplot, psutil, gnupg, graphviz, hiredis, html2text, webencodings, html5lib, imagesize, inflection, isort, jellyfish, kafka-python, langdetect, langid, ipython-genutils, decorator, traitlets, simplegeneric, backports.shutil-get-terminal-size, scandir, pathlib2, pickleshare, Pygments, ptyprocess, pexpect, wcwidth, prompt-toolkit, IPython, line-profiler, lxml, mechanize, memory-profiler, funcsigs, pbr, mock, more-itertools, xmltodict, ecdsa, pycryptodome, python-jose, pyaml, jsonpickle, aws-xray-sdk, werkzeug, backports.weakref, backports.tempfile, cookies, responses, backports.ssl-match-hostname, docker-pycreds, websocket-client, docker, jsondiff, moto, msgpack, MySQL-python, nameparser, neo4j-driver, ner, networkx, nltk, ordereddict, olefile, Pillow, planout, plotly, prettytable, psycopg2, py, pyarrow, sasl, thrift, thrift-sasl, PyHive, pylint, bcrypt, pynacl, paramiko, pysftp, pyspark-dist-explore, PyStemmer, pytest, python-dotenv, python-Levenshtein, redis, regex, LEPL, rfc6266-parser, rpy2, s3fs, seaborn, slackclient, snowballstemmer, msgpack-numpy, murmurhash, cymem, preshed, cytoolz, plac, tqdm, dill, pathlib, thinc, ujson, spacy, Sphinx, sqlparse, Tempita, sqlalchemy-migrate, sqlalchemy-redshift, tabulate, textblob, textstat, timeout-decorator, PyJWT, twilio, unicodecsv, ua-parser, user-agents, virtualenv, virtualenv-clone, stevedore, virtualenvwrapper, wget, wordcloud, xlrd, xgboost
  Found existing installation: numpy 1.16.0
    Uninstalling numpy-1.16.0:
      Successfully uninstalled numpy-1.16.0
Exception:
Traceback (most recent call last):
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 176, in main
    status = self.run(options, args)
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/__init__.py", line 57, in install_given_reqs
    **kwargs
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 919, in install
    use_user_site=use_user_site, pycompile=pycompile,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 445, in move_wheel_files
    warn_script_location=warn_script_location,
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/wheel.py", line 583, in move_wheel_files
    outrows = get_csv_rows_for_installed(reader)
  File "/home/hadoop/venv2/local/lib/python2.7/site-packages/pip/_internal/wheel.py", line 565, in get_csv_rows_for_installed
    for fpath, digest, length in old_csv_rows:
ValueError: too many values to unpack
viggyfresh commented 5 years ago

Same issue when installing (when invoking using tox):

(venv) viggy-mbp:kb viggy$ tox
...
py27 installdeps: -e.[dev], -rrequirements.txt
...
Exception:
Traceback (most recent call last):
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 176, in main
    status = self.run(options, args)
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/req/__init__.py", line 57, in install_given_reqs
    **kwargs
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 919, in install
    use_user_site=use_user_site, pycompile=pycompile,
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 445, in move_wheel_files
    warn_script_location=warn_script_location,
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/wheel.py", line 583, in move_wheel_files
    outrows = get_csv_rows_for_installed(reader)
  File "/Users/viggy/GitHub/curai/python/libs/kb/.tox/py27/lib/python2.7/site-packages/pip/_internal/wheel.py", line 565, in get_csv_rows_for_installed
    for fpath, digest, length in old_csv_rows:
ValueError: too many values to unpack

one more update: only happens when trying to install with a requirements file with local FS paths

as others have verified, this chokes on files that have commas in their name.

cjerdonek commented 5 years ago

It looks like this code change was introduced in this merge commit https://github.com/pypa/pip/commit/7696e7e5303b23d233097cf26749bc1e4a5e0645 (specifically https://github.com/pypa/pip/pull/6067/commits/b1c33bdee43024b887f31f95e3e99a7767d5651d ) as part of PR #6067. It looks like there were code changes in the commit in addition adding type annotations.

/cc @mkurnikov

rcramblit commented 5 years ago

I grabbed an example of some of the supposed 3-tuples this code expects: Correct ones:

['statsmodels/datasets/strikes/__init__.py', 'sha256=zCyaiU_R6WB1U8RWu74iAZTG2C6vdfMNHvRJtUZutPg', '20']
['statsmodels/datasets/strikes/data.py', 'sha256=OxuiuxzuSo9LcXLozu17u5fe2CrPnZGkMrxRlGgFd0o', '1951']
['statsmodels/datasets/strikes/strikes.csv', 'sha256=uuwOMRln8dydCCqacBJoAkSolLJXD3vRg38y6TFZaZQ', '719']

Less-than correct:

['statsmodels/datasets/tests/raw.github.com', 'vincentarelbundock', 'Rdatasets', 'master', 'csv', 'car', 'Duncan.csv.zip', 'sha256=igrYVFyIhyIbxUEuvNdq1sdiRZHsvNBrkf7l6JNmekA', '621']
['statsmodels/datasets/tests/raw.github.com', 'vincentarelbundock', 'Rdatasets', 'master', 'datasets.csv.zip', 'sha256=TlfJBE_WdTwnd3vms7ckO4rcKX2sMXLsiCkPvCEwlRc', '16578']
['statsmodels/datasets/tests/raw.github.com', 'vincentarelbundock', 'Rdatasets', 'master', 'doc', 'car', 'rst', 'Duncan.rst.zip', 'sha256=sbmyDj-ve-Mv1bC4oMtZB0AEoW0rS6R0xKc8YbkycyA', '633']
cjerdonek commented 5 years ago

@rcramblit Can you determine who or what generated the files causing the problem?

rcramblit commented 5 years ago

@cjerdonek Looks like these particular ones are coming from: https://github.com/statsmodels/statsmodels/tree/v0.6.1/statsmodels/datasets/tests

Filenames with commas in them..

cjerdonek commented 5 years ago

And how about who / what generated the csv (RECORD) file referencing those files? PEP 376 says here that commas should be escaped: https://www.python.org/dev/peps/pep-0376/#record

rcramblit commented 5 years ago

Not sure I get what you're asking - Aren't these created by pip when a package is installed? When I look at the RECORD file for statsmodels (./lib64/python2.7/site-packages/statsmodels-0.9.0.dist-info/RECORD) these rows seem properly formatted:

"statsmodels/datasets/tests/raw.github.com,vincentarelbundock,Rdatasets,master,csv,car,Duncan.csv.zip",sha256=igrYVFyIhyIbxUEuvNdq1sdiRZHsvNBrkf7l6JNmekA,621
"statsmodels/datasets/tests/raw.github.com,vincentarelbundock,Rdatasets,master,datasets.csv.zip",sha256=TlfJBE_WdTwnd3vms7ckO4rcKX2sMXLsiCkPvCEwlRc,16578
"statsmodels/datasets/tests/raw.github.com,vincentarelbundock,Rdatasets,master,doc,car,rst,Duncan.rst.zip",sha256=sbmyDj-ve-Mv1bC4oMtZB0AEoW0rS6R0xKc8YbkycyA,633

Edit - That is from an installation using pip==18.1

cjerdonek commented 5 years ago

I'm not very knowledgeable on this aspect, but for example, here's a similar issue involving commas in the filename where the issue seems to be with an original RECORD file, and the file wasn't generated by pip but rather by the wheel project: https://github.com/pypa/pip/issues/6103

So one possibility of what might be happening is that pip reads an existing RECORD file (possibly externally generated) and generates a new one, and what you've included above is the new one rather than the original one. The change seems to be that previously, pip was "loose" in what RECORD files it could read (which is why 18.1 was able to install the project you mention), whereas now it became more strict, so you're seeing a problem with a RECORD file that previously wasn't complying. I was trying to trace where the original RECORD file came from. You mentioned "pre-built" packages in your original post.

rcramblit commented 5 years ago

Ah I see. I think you hit it on the nose. I see in the build logs Building wheels for ..., including this package. When I create the wheel myself with python setup.py bdist_wheel I can see the malformed RECORD file.

liyiliuxingyu commented 5 years ago

This looks like a duplicate of Issue #6166

pradyunsg commented 5 years ago

It looks like there were code changes in the commit in addition adding type annotations.

That's on me. ISTM that I didn't review that PR properly.

I'll look into this and file a bugfix PR.

mgedmin commented 5 years ago

I'm hitting this bug when I try to pip install certbot-apache into a clean virtualenv (using Python 3.6, in case that matters).

pradyunsg commented 5 years ago

Okay yeah, this is a problem on the side of wheels being built and pip 19.0 caring about getting more strict as @cjerdonek noted; not the code change pertaining to the type annotations PR. (yay! I did review the code properly!)

pip 18.1 was a lot kinder, but would generate invalid RECORD files and fail to properly handle uninstalls.

https://github.com/pypa/pip/blob/6af9de97bbd2427f82942e476c590a2db22ea1ff/src/pip/_internal/wheel.py#L504-L508

So, this isn't really a bug introduced in pip 19.0, so I'll drop the "bug".

That said, @pypa/pip-committers do we want to be kind and just print a warning or be strict and refuse to install? We can (and should) add an error message here regardless; instead of a bad traceback. :)

wheel issue: https://github.com/pypa/wheel/issues/280

cjerdonek commented 5 years ago

My inclination at this point would be to return to the previous (non-strict) behavior and discuss doing something more or different in a slower time frame. Some of us had discussed being strict with regard to RECORD files in a different, related issue, and we decided against it for the time being IIRC (see #5868 and #5913, for example). If there's an easy way to log a warning like by doing a length check, perhaps that would be okay. (I'm also not sure that the RECORD winds up invalid because I think pip generates a new, correct one. The commenter said above that the resulting file looked okay.)

By the way, you might have misunderstood my comment about the type annotations. What I was saying is that the code change that caused this (refactoring out a get_csv_rows_for_installed() function) was done in PR #6067, which was called "Add type annotations..." But the PR didn't just add type annotations. It also mixed in some code refactorings that I think merited separate review and discussion.

pradyunsg commented 5 years ago

the PR didn't just add type annotations. It also mixed in some code refactorings that I think merited separate review and discussion.

Ah right, that makes more sense. :)

pradyunsg commented 5 years ago

I'm also not sure that the RECORD winds up invalid because I think pip generates a new, correct one.

IIUC, wheels generated with python setup.py bdist_wheel will have that. Wheel built through pip (via pip wheel xyz or pip install <sdist>) will use pip's wheel RECORD generation.

My inclination at this point would be to return to the previous (non-strict) behavior and discuss doing something more or different in a slower time frame.

Fair enough. Coming from https://github.com/pypa/pip/issues/5913#issuecomment-432986021 -> I think this falls in the second category.

That said, a bodge I have in mind is ",".joining all but the last 2 entries prior to sorting and writing the RECORD. It's kinda like modifying pip to cover up an issue in wheel but I think we can remove that in the future and put in better error messaging instead, if needed. It's dirty but it'll work and is better than writing a broken RECORD.

pradyunsg commented 5 years ago

use pip's wheel RECORD generation.

Ok, I spoke wrong -- it's when pip's installing with that it rewrites RECORD.

ohemorange commented 5 years ago

My inclination at this point would be to return to the previous (non-strict) behavior and discuss doing something more or different in a slower time frame.

Certbot would really, really appreciate this, as the change breaks certbot for all of our certbot-auto users.

Can we get the change reverted now, and fixed fully later? Otherwise we'll have to patch and release a point version of Certbot ASAP, before people's certs start expiring.

Some of our test files have commas in filenames. We plan to address this in a future release, but it seems safer all around to revert the breaking change and then fix things carefully in tandem.

cjerdonek commented 5 years ago

FYI, I added a PR for this here: https://github.com/pypa/pip/pull/6191

sbidoul commented 5 years ago

@pradyunsg will there be a 19.0.2 including the fix for this?

cjerdonek commented 5 years ago

@sbidoul Yes, see here: https://github.com/pypa/pip/issues/6106#issuecomment-458491164

emtwo commented 5 years ago

I had this issue and updating my statsmodels dependency to the latest fixed it...

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.