Open wolfv opened 4 years ago
let me know if you are happy (in general) with this PR, then I'll fix the codacy issues. Would be cool to move this project over to python3 as the clock is ticking for python 2 :)
Hey @wolfv, Thank you very much for your contribution!
I did some quick tests based on https://github.com/opencultureconsulting/openrefine-client/blob/master/tests/cli_bash.ipynb and some basic functionalities seem to be broken.
[felix@tux Desktop]$ git clone -b py3_port https://github.com/wolfv/openrefine-client.git
Cloning into 'openrefine-client'...
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 1201 (delta 0), reused 3 (delta 0), pack-reused 1189
Receiving objects: 100% (1201/1201), 2.52 MiB | 1.55 MiB/s, done.
Resolving deltas: 100% (630/630), done.
[felix@tux Desktop]$ cd openrefine-client/
[felix@tux openrefine-client]$ python3 refine.py --download "https://git.io/fj5hF" --output=duplicates.csv
Traceback (most recent call last):
File "refine.py", line 35, in <module>
__main__.main()
File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 239, in main
cli.download(options.download, output_file=options.output)
File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 161, in download
urllib.request.urlretrieve(url, output_file, context=context)
TypeError: urlretrieve() got an unexpected keyword argument 'context'
[felix@tux openrefine-client]$ python3 refine.py --create tests/data/cli/duplicates.csv
Traceback (most recent call last):
File "refine.py", line 35, in <module>
__main__.main()
File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 251, in main
cli.create(options.create, **kwargs)
File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 127, in create
**kwargs)
File "/home/felix/Desktop/openrefine-client/google/refine/refine.py", line 275, in new_project
'create-project-from-upload', options, params, files=files
TypeError: urlopen() got an unexpected keyword argument 'files'
I am on parental leave and will probably not be able to take a closer look at it until March. @boerni667 is also working on a port to Python3 (https://github.com/boerni667/openrefine-client) and may want to check it out before then.
Hey @felixlohmeier thanks for taking a look. Indeed, there were quite a bunch of issues left ... I attempted to fix them (the CLI at least seems to work fine with the latest changes).
There is something going on with the mode
thing though... maybe there was an API change in OpenRefine (I am testing with 3.2).
I think it would also be fine / nice to drop support for older OpenRefine versions.
Great, I will take a closer look next month.
I had started to investigate this systematically a few months ago (https://github.com/paulmakepeace/refine-client-py/issues/19), but I lacked the time to complete it. Dropping support for older versions would be a pragmatic solution.
@boerni667 and @jnauber, maybe you would like to join this discussion?
After comparing the development branches from @wolfv and @boerni667 [1] and discussing the matter with @felixlohmeier, i do think that the best way to continue the development is to merge this pull request (after successful testing through @felixlohmeier) and afterwards open a new pull request with the changes in @boerni667 fork.
What do you think about this procedure?
[1] https://github.com/wolfv/openrefine-client/compare/py3_port...boerni667:master
sounds good to me :) I have ticked the checkbox so that other maintainers can push to my branch as well.
Sorry for the long delay. I will check and merge the port on Python3 by end of July.
No worries, I am happy to see this move forward.
I'm working on it now. Great work! So far I've noticed only two deviations. The template option "suffixById=true" returns wrong file names and the connection error is so verbose that we better intercept it. I will take care of it.
Unfortunately code freezing with pyinstaller does not work well with python3 and requests. The executable file takes almost twice as long to start up:
[felix@tux virtualbox-share]$ time ./openrefine-client_py3
Usage: openrefine-client_py3 [--help | OPTIONS]
real 0m0,467s
user 0m0,420s
sys 0m0,040s
[felix@tux virtualbox-share]$ time ./openrefine-client_0-3-8_linux
Usage: openrefine-client_0-3-8_linux [--help | OPTIONS]
real 0m0,242s
user 0m0,190s
sys 0m0,030s
If I exclude requests, then the performance is much better. I will have a look if these parts can be implemented with urllib.
[felix@tux virtualbox-share]$ time ./openrefine-client_py3_no-requests
Usage: openrefine-client_py3_no-requests [--help | OPTIONS]
real 0m0,296s
user 0m0,250s
sys 0m0,041s
Thanks @felixlohmeier I would additionally be interested in having openrefine and htis library on conda-forge. Maybe we can make that happen :)
I would additionally be interested in having openrefine and this library on conda-forge.
Yeah, basically a good idea. Actually, I was planning on just updating the CLI functionality. Do you also need the other parts of the library (the "upstream way")?
Hey @wolfv, it's been almost a year since you initiated the port to Python3. Thanks again for that!
In August I had written some tests to check the functionality of the command line interface. Thereby I noticed a few bugs. However, I had bigger problems with the creation of the one-file-executables in a Python3 environment. Here is a list:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 144: invalid start byte
ModuleNotFoundError: No module named 'google'
Over the holidays I finally have time to work on the openrefine-client again. Actually I had hoped to finish the port to Python3 first and then make it compatible with OpenRefine 3.3+ (csrf token). But I can't do both and OpenRefine 3.3 was released almost a year ago. Therefore I want to focus on one-file-executables supporting OpenRefine 3.3+ and stay with Python2 for now.
Do you want to continue and publish the port on your own? That would be really great, because a lot of people had shown interest in it. I'm sorry I can't contribute more right now.
Hello @felixlohmeier, @wolfv, @jnauber I'm arriving late to the repo but would like to help where I can to see if we can update this for Python2 and possibly migrate this to Python 3.10? with a few new resources. I will probably be asking a bunch of questions that seem obvious but with Python Rule #2 'Explicit is better than implicit'.
Sometimes I regularly have people ask for open source projects they could assist with and who doesn't like hacking API's.
Hi @datatalking, Help is very welcome here! I look forward to your questions!
I'm currently working on a new implementation in Bash (orcli) and was actually planning to discontinue this project (openrefine-client). See also: https://forum.openrefine.org/t/orcli-yet-another-client-for-openrefine-written-in-bash/753
hey, I attempted to port the openrefine client to python3, and I have also swithced to use the
requests
library for communicating with the server (including uploading files).