opencultureconsulting / openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
GNU General Public License v3.0
83 stars 19 forks source link

port to python 3.7 and use requests #8

Open wolfv opened 4 years ago

wolfv commented 4 years ago

hey, I attempted to port the openrefine client to python3, and I have also swithced to use the requests library for communicating with the server (including uploading files).

wolfv commented 4 years ago

let me know if you are happy (in general) with this PR, then I'll fix the codacy issues. Would be cool to move this project over to python3 as the clock is ticking for python 2 :)

felixlohmeier commented 4 years ago

Hey @wolfv, Thank you very much for your contribution!

I did some quick tests based on https://github.com/opencultureconsulting/openrefine-client/blob/master/tests/cli_bash.ipynb and some basic functionalities seem to be broken.

[felix@tux Desktop]$ git clone -b py3_port https://github.com/wolfv/openrefine-client.git
Cloning into 'openrefine-client'...
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 1201 (delta 0), reused 3 (delta 0), pack-reused 1189
Receiving objects: 100% (1201/1201), 2.52 MiB | 1.55 MiB/s, done.
Resolving deltas: 100% (630/630), done.
[felix@tux Desktop]$ cd openrefine-client/
[felix@tux openrefine-client]$ python3 refine.py --download "https://git.io/fj5hF" --output=duplicates.csv
Traceback (most recent call last):
  File "refine.py", line 35, in <module>
    __main__.main()
  File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 239, in main
    cli.download(options.download, output_file=options.output)
  File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 161, in download
    urllib.request.urlretrieve(url, output_file, context=context)
TypeError: urlretrieve() got an unexpected keyword argument 'context'
[felix@tux openrefine-client]$ python3 refine.py --create tests/data/cli/duplicates.csv
Traceback (most recent call last):
  File "refine.py", line 35, in <module>
    __main__.main()
  File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 251, in main
    cli.create(options.create, **kwargs)
  File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 127, in create
    **kwargs)
  File "/home/felix/Desktop/openrefine-client/google/refine/refine.py", line 275, in new_project
    'create-project-from-upload', options, params, files=files
TypeError: urlopen() got an unexpected keyword argument 'files'

I am on parental leave and will probably not be able to take a closer look at it until March. @boerni667 is also working on a port to Python3 (https://github.com/boerni667/openrefine-client) and may want to check it out before then.

wolfv commented 4 years ago

Hey @felixlohmeier thanks for taking a look. Indeed, there were quite a bunch of issues left ... I attempted to fix them (the CLI at least seems to work fine with the latest changes). There is something going on with the mode thing though... maybe there was an API change in OpenRefine (I am testing with 3.2). I think it would also be fine / nice to drop support for older OpenRefine versions.

felixlohmeier commented 4 years ago

Great, I will take a closer look next month.

I had started to investigate this systematically a few months ago (https://github.com/paulmakepeace/refine-client-py/issues/19), but I lacked the time to complete it. Dropping support for older versions would be a pragmatic solution.

felixlohmeier commented 4 years ago

@boerni667 and @jnauber, maybe you would like to join this discussion?

jnauber commented 4 years ago

After comparing the development branches from @wolfv and @boerni667 [1] and discussing the matter with @felixlohmeier, i do think that the best way to continue the development is to merge this pull request (after successful testing through @felixlohmeier) and afterwards open a new pull request with the changes in @boerni667 fork.

What do you think about this procedure?

[1] https://github.com/wolfv/openrefine-client/compare/py3_port...boerni667:master

wolfv commented 4 years ago

sounds good to me :) I have ticked the checkbox so that other maintainers can push to my branch as well.

felixlohmeier commented 4 years ago

Sorry for the long delay. I will check and merge the port on Python3 by end of July.

wolfv commented 4 years ago

No worries, I am happy to see this move forward.

felixlohmeier commented 4 years ago

I'm working on it now. Great work! So far I've noticed only two deviations. The template option "suffixById=true" returns wrong file names and the connection error is so verbose that we better intercept it. I will take care of it.

felixlohmeier commented 4 years ago

Unfortunately code freezing with pyinstaller does not work well with python3 and requests. The executable file takes almost twice as long to start up:

[felix@tux virtualbox-share]$ time ./openrefine-client_py3
Usage: openrefine-client_py3 [--help | OPTIONS]

real    0m0,467s
user    0m0,420s
sys 0m0,040s

[felix@tux virtualbox-share]$ time ./openrefine-client_0-3-8_linux
Usage: openrefine-client_0-3-8_linux [--help | OPTIONS]

real    0m0,242s
user    0m0,190s
sys 0m0,030s

If I exclude requests, then the performance is much better. I will have a look if these parts can be implemented with urllib.

[felix@tux virtualbox-share]$ time ./openrefine-client_py3_no-requests
Usage: openrefine-client_py3_no-requests [--help | OPTIONS]

real    0m0,296s
user    0m0,250s
sys 0m0,041s
wolfv commented 4 years ago

Thanks @felixlohmeier I would additionally be interested in having openrefine and htis library on conda-forge. Maybe we can make that happen :)

felixlohmeier commented 4 years ago

I would additionally be interested in having openrefine and this library on conda-forge.

Yeah, basically a good idea. Actually, I was planning on just updating the CLI functionality. Do you also need the other parts of the library (the "upstream way")?

felixlohmeier commented 3 years ago

Hey @wolfv, it's been almost a year since you initiated the port to Python3. Thanks again for that!

In August I had written some tests to check the functionality of the command line interface. Thereby I noticed a few bugs. However, I had bigger problems with the creation of the one-file-executables in a Python3 environment. Here is a list:

Over the holidays I finally have time to work on the openrefine-client again. Actually I had hoped to finish the port to Python3 first and then make it compatible with OpenRefine 3.3+ (csrf token). But I can't do both and OpenRefine 3.3 was released almost a year ago. Therefore I want to focus on one-file-executables supporting OpenRefine 3.3+ and stay with Python2 for now.

Do you want to continue and publish the port on your own? That would be really great, because a lot of people had shown interest in it. I'm sorry I can't contribute more right now.

datatalking commented 1 year ago

Hello @felixlohmeier, @wolfv, @jnauber I'm arriving late to the repo but would like to help where I can to see if we can update this for Python2 and possibly migrate this to Python 3.10? with a few new resources. I will probably be asking a bunch of questions that seem obvious but with Python Rule #2 'Explicit is better than implicit'.

Sometimes I regularly have people ask for open source projects they could assist with and who doesn't like hacking API's.

felixlohmeier commented 1 year ago

Hi @datatalking, Help is very welcome here! I look forward to your questions!

I'm currently working on a new implementation in Bash (orcli) and was actually planning to discontinue this project (openrefine-client). See also: https://forum.openrefine.org/t/orcli-yet-another-client-for-openrefine-written-in-bash/753