oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
616 stars 39 forks source link

Migrate code from Python 2.7.x to Python 3 #51

Closed ibnesayeed closed 5 years ago

ibnesayeed commented 7 years ago

@machawk1, is there anything that is stopping us from moving this to to Python 3.x?

machawk1 commented 7 years ago

@ibnesayeed I have not checked compatibility of all of the included libraries w/ Py3. What advantage would we gain in doing so? There are more fundamental problems w/ the project to be tackled than updating the code to the latest version of the language.

ibnesayeed commented 7 years ago

At this point it is not a priority, but for the longevity of the project it should not use the version of the language that is nearing the extended EOL date.

machawk1 commented 7 years ago

Ok, it will be considered in the future when more specific rationales are put forth.

ibnesayeed commented 7 years ago
$ git clone https://github.com/oduwsdl/ipwb
$ cd ipwb/
$ 2to3 -W .
$ pip install -r requirements.txt
$ pip install ./
$ ipwb
Traceback (most recent call last):
  File "/usr/local/bin/ipwb", line 9, in <module>
    load_entry_point('ipwb==0.2016.12.9.412', 'console_scripts', 'ipwb')()
  File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.5/site-packages/ipwb/__main__.py", line 6, in <module>
    from . import replay
  File "/usr/local/lib/python3.5/site-packages/ipwb/replay.py", line 19, in <module>
    from . import util as ipwbConfig
  File "/usr/local/lib/python3.5/site-packages/ipwb/util.py", line 7, in <module>
    import exceptions
ImportError: No module named 'exceptions'
machawk1 commented 7 years ago

Same commands from virtualenv:

$ ipwb
Traceback (most recent call last):
  File "/usr/local/bin/ipwb", line 11, in <module>
    load_entry_point('ipwb==0.2016.12.10.1711', 'console_scripts', 'ipwb')()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 561, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2627, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2287, in load
    return self.resolve()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2293, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python2.7/site-packages/ipwb/__main__.py", line 7, in <module>
    from . import indexer
  File "/usr/local/lib/python2.7/site-packages/ipwb/indexer.py", line 1, in <module>
    Non
NameError: name 'Non' is not defined
ibnesayeed commented 7 years ago

It looks like exceptions is now part of the standard core in Py3 so no need to import it. https://groups.google.com/forum/#!topic/django-users/hCqNuEQKwso

machawk1 commented 7 years ago

@ibnesayeed The standard approach appears to be to make the code compatible with both Py2 and Py3. Would that be a suitable goal for this ticket or is it important that it ONLY be compatible with Py3?

ibnesayeed commented 7 years ago

Supporting both Py2 and Py3 is a good idea for tools that have wider user base of which some might be using legacy systems or cant upgrade their Python environment. It is also a good idea for libraries/packages that might be used in different programs. Other than that I don't see a reason to support both which will increase our test burden. Additionally, it would encourage the use of a version of Python that is reaching it's extended end of life (which won't be a good service for the Python community that is struggling to get rid of the older version). People often use a compatibility library called six to facilitate this, but I don't really see a need of that in this repo.

machawk1 commented 7 years ago

@ibnesayeed So what's your vote? It still seems that a large majority of users have Py2 by default, so might be discouraged when installing ipwb and seeing that it immediately fails if we only cater to Py3. I'm all for using six and am admittedly way more familiar with Py2 myself. Despite the overhead in testing, we currently have very minimal test cases written, so adding the extra test cases to account for both Py variants might make the code more resilient.

Regarding this last point, we can also try to make the test code as Py version agnostic as possible to mitigate the potential redundancy of supporting both versions.

machawk1 commented 7 years ago

When installing in Python3 after fixing relative imports, running pip3 install -r requirements.txt, then running $ ipwb reports ImportError: No module named 'ipfsapi'. Strange given that ipfsapi is defined in the requirements file. This might be due to the previous issue of the ipfsapi package switching from ipfs-api to ipfsapi and some internal dynamics around version 0.2.x and what's on my system. Will dig further.

Update: pip3 freeze > out.txt reports ipfsapi==0.4.0, so that's not the issue. Further, ipfsapi is defined in the install_requires list in setup.py, so it's not a local vs. pypi issue.

Further update: ipfs-api and ipfsapi were both installed. Uninstalled ipfs-api but the complaint remains, even after reinstalling requirements and verifying that ipfs-api was not reinstalled.

ibnesayeed commented 7 years ago

On the other hand, I would go for Py3. Our tool is perhaps not going to be used by newbies. We expect our users to be advance who are familiar with the bleeding edge technology like IPFS.

When I said testing overhead, it was not just about test cases, but also performing tests on various environments.

machawk1 commented 7 years ago

Our tool is perhaps not going to be used by newbies. We expect our users to be advance who are familiar with the bleeding edge technology like IPFS.

Wrong. We want to make the tool more accessible and useful to anyone with WARCs or who are passed an index file (for now) and simply want to replay the collection. I think macOS still ships only w/ Py2 by default.

ibnesayeed commented 7 years ago

Well, I still think it is an exploratory effort with more stable and practical model to be developed later (#61). If you want to make it accessible to more people then perhaps the right approach would be to code with Py3 as the target platform, but provide backwards compatibility using six. This way it will be easier to get rid of Py2 support in future.

machawk1 commented 7 years ago

@ibnesayeed That seems like the right approach. I'll continue to document efforts in this ticket.

machawk1 commented 7 years ago

Preliminary work is being done in https://github.com/oduwsdl/ipwb/tree/issue-51 . One weird change was pkg_resources.resource_string returns a byte-like string in Py3 and a simple manipulable string in Py2. This required explicit conversion and needs testing. Beyond that, simple translation to print() and import fixes including a not-so-elegant Py2 conditional import via six.

machawk1 commented 7 years ago

Yet another reason, https://github.com/ipfs/py-ipfs-api says py2 is deprecated.

ibnesayeed commented 6 years ago

I am of the opinion to fully migrate it to Py3 and completely discontinue support for Py2 (not even opportunistic support). As Py3 is becoming more ubiquitous, we will see more people complaining about the software failing in their environment (of which some might not even bother to report and just move on). An attempt to support both versions is counter-productive that leaves larger failure and support surface.

machawk1 commented 6 years ago

Thanks for your 2ยข @ibnesayeed . ๐Ÿ˜ƒ

A few reasons for my original concerns and rationale for retaining Py2 support was:

That said, I agree that migration is or will be needed. The benefits at the moment, however, do not outweigh the work required to do so. I fear there's more to it than running Py2to3.

ibnesayeed commented 6 years ago

PyInstaller's documentation does not seem to suggest that their Py3 support is any less stable than Py2.

If you really think that there are substantial number of IPWB users who might have Py2 and not Py3, then it is OK to continue supporting that. However, Py3 support should be prioritized (which, I think, is broken at the moment).

machawk1 commented 6 years ago

I feel it will be best to first make ipwb cross-compatible then we can evaluate the overhead in maintaining the Py2 code bits to determine whether it will worthwhile to retain support for both versions.

machawk1 commented 6 years ago

http://python-future.org/futurize_cheatsheet.html

ibnesayeed commented 6 years ago

Looks like a good approach to give it a try.

ibnesayeed commented 6 years ago

Let's prioritize this. In the DWeb Summit some people gave me that "Still Py2?" look!

machawk1 commented 6 years ago

@ibnesayeed We have quite a dependency chain but I think giving this priority is correct.

One such soft dependency before the transition is to have more rigorous testing to ensure consistency of functionality for the plethora of edge cases that are not currently automatically tested. Having those in-place would be good for some quasi regression testing but I believe it will just cause us to drag our feet more in moving to Py3.

The various futzing and pattern of closing tickets we created the same day has also been a barrier in making progress on the big lingering ones (like #51).

machawk1 commented 5 years ago

https://python3statement.org/

https://pythonclock.org/

An updated TODO (check-)list of the remaining parts of the code that need to be converted to Py3 would be useful here and facilitation completion of this ticket.

machawk1 commented 5 years ago

As verbally discussed, one of the barriers in completing this is adapting the test "suite" to work in Py3. The code itself seems to work as expected.

When running the tests using the docker build process after changing the Dockerfile to use Python 3.7, the main complaint is the tests' inability to find our testUtil.py script, which provides some utility functions that are used from within a few tests:

============================= test session starts ==============================
platform linux -- Python 3.7.2, pytest-4.3.0, py-1.8.0, pluggy-0.9.0
rootdir: /ipwb, inifile: setup.cfg
plugins: flake8-1.0.4, cov-2.6.1
collected 35 items / 4 errors / 31 selected

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_indexing.py ____________________
ImportError while importing test module '/ipwb/tests/test_indexing.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_indexing.py:4: in <module>
    import testUtil as ipwbTest
E   ModuleNotFoundError: No module named 'testUtil'
____________________ ERROR collecting tests/test_memento.py ____________________
ImportError while importing test module '/ipwb/tests/test_memento.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_memento.py:3: in <module>
    import testUtil as ipwbTest
E   ModuleNotFoundError: No module named 'testUtil'
________________ ERROR collecting tests/test_randomized_add.py _________________
ImportError while importing test module '/ipwb/tests/test_randomized_add.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_randomized_add.py:8: in <module>
    import testUtil as ipwbTest
E   ModuleNotFoundError: No module named 'testUtil'
____________________ ERROR collecting tests/test_replay.py _____________________
ImportError while importing test module '/ipwb/tests/test_replay.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_replay.py:3: in <module>
    import testUtil as ipwbTest
E   ModuleNotFoundError: No module named 'testUtil'

----------- coverage: platform linux, python 3.7.2-final-0 -----------
Name                           Stmts   Miss  Cover
--------------------------------------------------
ipwb/__init__.py                   1      0   100%
ipwb/__main__.py                  81     81     0%
ipwb/indexer.py                  251    204    19%
ipwb/replay.py                   681    681     0%
ipwb/util.py                     200    149    26%
setup.py                           6      6     0%
tests/__init__.py                  0      0   100%
tests/testUtil.py                 49     49     0%
tests/test_indexing.py            14     12    14%
tests/test_memento.py            224    222     1%
tests/test_nodeToNode.py           5      2    60%
tests/test_randomized_add.py      38     32    16%
tests/test_replay.py              71     69     3%
tests/test_util.py                 7      3    57%
--------------------------------------------------
TOTAL                           1628   1510     7%

!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!
=========================== 4 error in 1.03 seconds ============================
The command '/bin/sh -c $SKIPTEST || (ipfs daemon & while ! curl -s localhost:5001 > /dev/null; do sleep 1; done && py.test --cov=./)' returned a non-zero code: 2
machawk1 commented 5 years ago

Perhaps from within the tests, instead of using import testUtil as ipwbTest, try from . import testUtil as ipwbTest.

EDIT: This causes pytest to complain about other things (namely, urllib2), so perhaps progress:

==================================== ERRORS ====================================
____________________ ERROR collecting tests/test_memento.py ____________________
ImportError while importing test module '/ipwb/tests/test_memento.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_memento.py:10: in <module>
    import urllib2
E   ModuleNotFoundError: No module named 'urllib2'
____________________ ERROR collecting tests/test_replay.py _____________________
ImportError while importing test module '/ipwb/tests/test_replay.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_replay.py:9: in <module>
    import urllib2
E   ModuleNotFoundError: No module named 'urllib2'
=============================== warnings summary ===============================
/usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:16
/usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:16
/usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:16
  /usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:16: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Container, Iterable, MutableSet

/usr/local/lib/python3.7/site-packages/jinja2/utils.py:485
  /usr/local/lib/python3.7/site-packages/jinja2/utils.py:485: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import MutableMapping

/usr/local/lib/python3.7/site-packages/jinja2/runtime.py:318
  /usr/local/lib/python3.7/site-packages/jinja2/runtime.py:318: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.7.2-final-0 -----------
Name                           Stmts   Miss  Cover
--------------------------------------------------
ipwb/__init__.py                   1      0   100%
ipwb/__main__.py                  81     81     0%
ipwb/indexer.py                  251    204    19%
ipwb/replay.py                   681    590    13%
ipwb/util.py                     200    149    26%
setup.py                           6      6     0%
tests/__init__.py                  0      0   100%
tests/testUtil.py                 49     33    33%
tests/test_indexing.py            14      8    43%
tests/test_memento.py            224    215     4%
tests/test_nodeToNode.py           5      2    60%
tests/test_randomized_add.py      38     26    32%
tests/test_replay.py              71     65     8%
tests/test_util.py                 7      3    57%
--------------------------------------------------
TOTAL                           1628   1382    15%

!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
===================== 5 warnings, 2 error in 1.39 seconds ======================
The command '/bin/sh -c $SKIPTEST || (ipfs daemon & while ! curl -s localhost:5001 > /dev/null; do sleep 1; done && py.test --cov=./)' returned a non-zero code: 2
machawk1 commented 5 years ago

Finally closed in #609. ๐Ÿ™Œ