Open talumbau opened 10 years ago
At Spotify we're still on python2, so there is no real incentive to support python3. That said, I think we could support python3. I'm not sure what strategy to choose here; source code that is compatible with both 2 and 3 or have separate branches..
Ah I see. I haven't looked at all of the source, but my instinct is that you could have one source that is compatible with 2 and 3.
To start with it, I used Fig to create fresh development environment with python 3.4 on Ubuntu Trusty. Unfortunately everything borked at very beginning:
pip install -r requirements-dev.txt
To be more specific for protobuf package. Protobuf is not supported for python 3.x, for more details please look here: https://groups.google.com/forum/#!topic/protobuf/sb2pOm0SY6w
It may be hard to move forward with this feature ... closing(?)
Look at this :)
http://protobuf.googlecode.com/svn/trunk/CHANGES.txt mentions python 3 support for 2.6.0.
Maybe we don't close it?
wat!? protobuf 2.6.0 Supported protobuf for Hadoop is 2.5.0. I guess this ticket requires some investigation - how do you want to tackle this?
I just checked our Travis builds and they already pull in 2.6.0 because we depend on >2.4.1 :) Seems to work fine. There is one weird thing though is that protobuf 2.6.0 depends on python-dateutil>=1.4,<2. On my local install I had python-dateutil 2.2 installed and broke my upgrade to protobuf 2.6.0. With new installations this shouldn't be too big of a deal (as is the case with the Travis builds)
True if I change dependancy to >=2.6.0 on ubuntu with python3.4 I can install all the dev requirements.
setup.py test
doesn't work tho. Simple snakebite -v
either. Due to incompatibility py2 vs py3.
I'm trying to see how much effort it is. There seems to be an issue with google-apputils, one of the dependencies of protobuf: https://github.com/google/protobuf/issues/9
@ravwojdyla and myself did some more experimentation and it seems that google has to fix their protobuf for python3 before we can move on with this. We'll keep this issue open.
OK, Google has fixed it. I ran 2to3 on all the sources and replaced str("", "utf-8") with bytes("", "utf-8") in protobuf and was able to successfully import snakebite. Not sure if it works correct though.
Update: looks like protobuf defs should be updated, I am getting
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 140, in ls
recurse=recurse):
File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 1082, in _find_items
fileinfo = self._get_file_info(path)
File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 1214, in _get_file_info
return self.service.getFileInfo(request)
File "/usr/local/lib/python3.4/dist-packages/snakebite/service.py", line 35, in <lambda>
rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
File "/usr/local/lib/python3.4/dist-packages/snakebite/service.py", line 41, in call
return method(self.service, controller, request)
File "/usr/local/lib/python3.4/dist-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
self._StubMethod(inst, method, rpc_controller, request, callback))
File "/usr/local/lib/python3.4/dist-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
method_descriptor.output_type._concrete_class, callback)
File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 409, in CallMethod
self.get_connection(self.host, self.port)
File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 219, in get_connection
rpc_header = self.create_rpc_request_header()
File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 248, in create_rpc_request_header
rpcheader.clientId = self.client_id[0:16]
File "/usr/local/lib/python3.4/dist-packages/google/protobuf/internal/python_message.py", line 471, in field_setter
self._fields[field] = type_checker.CheckValue(new_value)
File "/usr/local/lib/python3.4/dist-packages/google/protobuf/internal/type_checkers.py", line 103, in CheckValue
raise TypeError(message)
TypeError: '35bc231f-07ae-41' has type <class 'str'>, but expected one of: (<class 'bytes'>,)
We will probably will need to compile the protobufs with version 3.0.0 before it to work. Unfortunately proto3 is still in alpha and he proto3 compiler isn't mainstream yet. Given that we need to recompile the protobufs, I'm not sure how to proceed, since this would make a python3/proto3 version incompatible with python2/proto2 version. @ravwojdyla any idea? Should we maintain two different versions? Or have a bunch of conditionals based on proto2/3, python2/3 and in the future probably hadoop versions in code?
No need to recompile. Just apply the fixing patch to 2.6.1: https://github.com/google/protobuf/pull/50
@wouterdebie any idea on how you plan to proceed? I started to port over spotify/luigi#747 and snakebite is currently a blocker. I had a fast look at snakebite code and I don't think supporting both python2/3 with one source code would be difficult. The main work would require to use unicode and bytes literal and encode and decode at the appropriate place (I can help for this part if you want). The tricky part would be the dependency on protobuf.
@gpoulin can't you solve this on luigi side? Core luigi shouldn't really depend on snakebite.
@gpoulin would be great to see some progress here. I would like to have one version that supports both py2/3. And yes, the protobuf stuff we'll have to solve, but I don't mind keeping two different compiled sets for that.
@Tarrasch I was under the impression that it was a soft requirement for Luigi, but hard requirement for "Hadoop Luigi", since all the Hadoop tests were failing on import snakebite
. However having a closer look it seems only to had more functionality if present. Anyway, having snakebite running on python 3 would still be great.
@gpoulin Exactly, it's basically that luigi will run faster with snakebite because it don't need to shell out to hadoopcli. :)
@talumbau @gpoulin @vmarkovtsev @wouterdebie. Do you guys have any updates on the status for the python 3 support?
@Tarrasch Even thou I'm currently working more in scala, I'm willing to help here. However, my understanding is that the version of protobuf that snakebite is currently using doesn't support python3. So if snakebite can support a protobuf version that support python3, I can help to make this project python3 compatible.
@gpoulin, I did try to scan the status for protobuf and python3 in general, but it doesn't seem to be well supported if at all.
Since I can not wait for a way to do hdfs fast in python3, I'll probably switch over to webhdfs.
For luigi users who are fine with using webhdfs, this could be relvant spotify/luigi#1438
@gpoulin protobuf compatible with python3 has been relased https://pypi.python.org/pypi/protobuf/3.0.0b2
Hey all, first of all, great work, this project is exactly what I need for a hadoop project I'm working on, and speed is definitely important (py3 more specifically).
Anyways, I'm going to take a stab at this, python-protobuf3 looks stable and my initial py3 compat work looks promising. I will update when I have more, thank you.
@kirklg awesome! Did you make any progress?
@anneschuth, yes, but its been slow moving, I have my fork, https://github.com/kirklg/snakebite/pull/2, that you can watch if you want to subscribe to progress :). I have a string of commits coming soon when I find the time, thanks.
Thanks @kirklg, looking forward to your commits!
@kirklg Any news on your PR? I am also looking forward to snakebite with Python 3 support. Thanks for making this possible.
Hey, are there any news on support for Python 3? I hope to heare any news, even if it is negative.
Given that at Spotify we don't use python3 there is no incentive to work on this. I hope someone else from the community takes a stab at this, but it's hard to say it that will happen.
On Feb 15, 2017 9:10 AM, "sumyeon" notifications@github.com wrote:
Hey, are there any news on support for Python 3? I hope to heare any news, even if it is negative.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spotify/snakebite/issues/62#issuecomment-279943784, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKgBtIdHP1B6jCICDhOa286iaoYBrn-ks5rcrLdgaJpZM4CBlij .
Before I (very ambitiously) start from scratch on this, has anyone already done any analysis on it? or even had a crack?
@rtets Appears to have been attempted in the past, and subsequently obliterated e.g. https://github.com/spotify/snakebite/commit/094e801e88f17dcaaac14fe2d98d72ce83de1b7f#diff-d9827893b8ca9a0f86a685a5fd2327b5
there's still some evidence of Python 3 compatibility e.g.
if sys.version_info[0] == 3:
long = int
I fear that you'd make this effort, and then it wouldn't last long before it reverts to Python2 specific code again. Shame, given how easy it is to set up TOX et al to test multiple versions, and how easy 2to3 and six libraries make supporting both, and how much better Python 3.5+ is (everything is a generator / iterator, unicode actually works for global languages etc etc).
@davoscollective Those commit simply added compatibility for the minicluster which was needed for testing in Luigi. The actual client use protobuf 2 that doesn't support python3. So to upgrade to python 3 it will first require updating protobuf. Also this library do a lot of io on mixing byte and string which is the part that changed the most between python2 and python3 which make the compatibility not so trivial.
What a crap... Spotify, don't be lazy! Please make this library Python 3.6 compatible
@kamikaze Almost python 3.7 now https://docs.python.org/3.7/whatsnew/3.7.html
Then move away from protobuf and start using apache arrow. It's the __future__
As long as snakebite's kerberos support relies on python-krbV: https://github.com/spotify/snakebite/blob/6a456e6100b0c1be66cc1f7f9d7f50494f369da3/setup.py#L49 it's not going to work with Python 3 on Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1024834#c4
Given that Python 2.x will hit EOL in just a few years and that this ticket has also been open for just a few years maybe there's some chance some poor soul will submit a PR.
Personally, the excuse that because Spotify don't use Python 3 they can't be bothered to do the work is a wearing a little thin...
I left Spotify a while ago, but I know that Spotify doesn't run HDFS anymore. They've moved to Google Cloud Platform where everything is stored in GCS and doesn't require the hadoop client or snakebite anymore. I guess someone at Spotify should communicate this and deprecate snakebite. As the original author, I've moved on to other things as well and don't have an incentive to maintain the project either. Since I wrote this during work hours, Spotify is the owner of snakebite and it's up to them to decide what to do with it.
@wouterdebie thanks for highlighting the issue (and for the client too, of course). so, this repo is dead, and nobody (Spotify or whoever admins repo) gonna merge this pull request? looks like @cclauss did a great job, hard to believe, that it will be lost for the community.
I'm really sorry about this. I would love to have merged this, but I don't have write access anymore. If the community wants to continue snakebite it can always be forked. @ravwojdyla also left Spotify, but he might have an idea who to talk to so the project can be owned by the community again.
at internet archive, we use this library a lot, use python3 and have been using a fork of kirklg's version for some time, we finally put it on pypi recently: https://pypi.org/project/snakebite-py3/
If anyone would like to merge in more recent changes and help us to get the tests to pass, the work would be much appreciated.
thanks for all of your past work @kirklg
Hey @jkafader kudos! I can't take any credit here, thanks for moving the pythons to 3 :D.
My priorities shifted and I could no longer get paid to work on this.
@jkafader I just saw this thread from a month ago and thank you so much for doing this! I reached out to some folks at Spotify to do something with this repo, but I haven't heard back. Who knows..
Do you have plans to support both python 2 and 3?