scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

Add absolute_import needed for python2 #214

Closed benkrikler closed 4 years ago

benkrikler commented 4 years ago

I'm not sure what the plan for python2 support is, but when using awkward array in python 2, the awkward.pandas module cannot be loaded, because there is a clash in the module name between pandas itself, and the awkward.pandas module. The import that takes place internally for pandas.api.extension then fails:

$ python --version
Python 2.7.15

$ python -c "import awkward.pandas as mod;print(mod)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ben/projects/awkward-array/awkward/pandas.py", line 6, in <module>
    import pandas.api.extensions
ImportError: No module named api.extensions

This is addressed by insisting on absollute_imports, which is default in python3 and needs importing from __future__ in python2.

benkrikler commented 4 years ago

I've added a unit test to catch this issue although the test will be skipped since no testing environment currently includes pandas. Would you be ok that we make pandas a dependency of the testing suite, @jpivarski?

jpivarski commented 4 years ago

Yes, Pandas can be added to the test_requires. You'd have to link the Pandas version to the Python version to get 2.7, though. I think the last Pandas version that worked in 2.7 was Pandas 0.23, released at the end of last year.

Also, I approve this PR and will merge it as soon as you give me the go-ahead (and tests pass).

benkrikler commented 4 years ago

Hmm, I'd understand if you don't want to add an explicit python 2 setup at this stage, given how close the end-of-life is for Python 2. I could leave the pandas version unpinned in test_requires, which might be better than no pandas testing at all (and the test I've added literally just imports the module)? Or if you want to leave it out to avoid complicating things now that's reasonable to me, in which case this PR would be good to go.

eduardo-rodrigues commented 4 years ago

Allow me to chip in: did we not agree within the org to (try and) provide a last major version of each package Python 2.7 compatible? Judging from the metadata in https://github.com/scikit-hep/awkward-array/blob/master/setup.py this is ensured at present, so it should be no problem to maintain 2.7 compatibility for a bit longer, right?

jpivarski commented 4 years ago

I'm willing to support Python 2.7. All we need is for the .travis.yml to put a constraint on the Pandas version when it loads, similar to this:

https://github.com/scikit-hep/uproot/blob/03072b07d22af4b6247ed99ead957d238daffaae/.travis.yml#L63

(Although in awkward's travis.yml, the variable is named $TRAVIS_PYTHON_VERSION.)

Leaving the pytest.importorskip in is a good idea. We do Apache Arrow the same way: it's not a dependency, but tested if it's possible to install, using a version that manages to be installed on the Python version in question.

jpivarski commented 4 years ago

Python 2.6 is where I draw the line: scikit-hep/uproot#402.

eduardo-rodrigues commented 4 years ago

Makes sense. @benkrikler, will you then complete this PR yourself, as per the suggestions above? Seems the easiest.

benkrikler commented 4 years ago

This has turned into a bigger thread than I expected!

I see a 2x2 matrix of solutions:

For dimension 2 there: unpinned has "risks" that our tests break on a future version of pandas. For the test I've added here, I'm confident to say that should be literally impossible (in a test pipeline) since the test only checks for importability and any pip / conda installed version of pandas should be importable. If this test is extended in the future then you could change the pinning, although by then python 2 support might be dropped anyway.

For dimension 1: installing from setup.py as a test_requires seems nicer and more general (e.g. if you switch to azure-CI in the future). The downsides are 1) everyone running the tests needs to install pandas, although counter-argument: that's pretty quick and easy anyway, and probably already the case for most users that actually run tests. Downside 2): if we want to define the pandas version as a function of the python version (dimension 2. above) then doing that in setup.py would be messy with environment variables and / or sys.

Anyway, without milking this more: can I add to test_requires of setup.py pandas, which has no version pinning?

benkrikler commented 4 years ago

Also, the last version of pandas to support python 2 officially was 0.24.2, based on the 0.24.0 release: https://github.com/pandas-dev/pandas/releases/tag/v0.24.0, so if I had to pin a version for python 2 that's what I'd use.

jpivarski commented 4 years ago

@benkrikler I've been pinning versions of NumPy and Numba in my Python 2.7 tests (and some old versions of Python 3, too). Since a final version is now known, I don't see any danger in it: Pandas 0.24.2 will always and forever be the best Pandas for Python 2.7.

I consider a user who runs the unit tests to be an expert user who can ensure that they have the right dependency versions. I think you can only run tests if you get the repo from GitHub—it's dropped from the PyPI/conda packages. Therefore, I worry a lot more about what goes into the requires than the test_requires. Even though the packages used in tests are included in the test_requires, each one of them is explicitly installed in .travis.yml because we want to be so careful about versions.

So don't worry about getting the version constraint right in test_requires; just say test_requires = [..., "pandas"], which would do the right thing for a sufficiently recent version of Python (or don't include it in the test_requires at all, since the test has an importorskip). But in the .travis.yml, run pip install "pandas<0.25" for Python 2.7 and pip install pandas for all other versions of Python (and maybe no Pandas at all for PyPy if it gives you trouble). Don't worry about AppVeyor.

benkrikler commented 4 years ago

Ok, I think it should all be good to go now for you, @jpivarski.

jpivarski commented 4 years ago

Thanks!