spotify / snakebite

A pure python HDFS client
Apache License 2.0
856 stars 216 forks source link

python 3 support? #62

Open talumbau opened 10 years ago

talumbau commented 10 years ago

Do you have plans to support both python 2 and 3?

wouterdebie commented 10 years ago

At Spotify we're still on python2, so there is no real incentive to support python3. That said, I think we could support python3. I'm not sure what strategy to choose here; source code that is compatible with both 2 and 3 or have separate branches..

talumbau commented 10 years ago

Ah I see. I haven't looked at all of the source, but my instinct is that you could have one source that is compatible with 2 and 3.

ravwojdyla commented 10 years ago

To start with it, I used Fig to create fresh development environment with python 3.4 on Ubuntu Trusty. Unfortunately everything borked at very beginning:

pip install -r requirements-dev.txt

To be more specific for protobuf package. Protobuf is not supported for python 3.x, for more details please look here: https://groups.google.com/forum/#!topic/protobuf/sb2pOm0SY6w

It may be hard to move forward with this feature ... closing(?)

wouterdebie commented 10 years ago

Look at this :)

http://protobuf.googlecode.com/svn/trunk/CHANGES.txt mentions python 3 support for 2.6.0.

Maybe we don't close it?

ravwojdyla commented 10 years ago

wat!? protobuf 2.6.0 Supported protobuf for Hadoop is 2.5.0. I guess this ticket requires some investigation - how do you want to tackle this?

wouterdebie commented 10 years ago

I just checked our Travis builds and they already pull in 2.6.0 because we depend on >2.4.1 :) Seems to work fine. There is one weird thing though is that protobuf 2.6.0 depends on python-dateutil>=1.4,<2. On my local install I had python-dateutil 2.2 installed and broke my upgrade to protobuf 2.6.0. With new installations this shouldn't be too big of a deal (as is the case with the Travis builds)

ravwojdyla commented 10 years ago

True if I change dependancy to >=2.6.0 on ubuntu with python3.4 I can install all the dev requirements. setup.py test doesn't work tho. Simple snakebite -v either. Due to incompatibility py2 vs py3.

wouterdebie commented 10 years ago

I'm trying to see how much effort it is. There seems to be an issue with google-apputils, one of the dependencies of protobuf: https://github.com/google/protobuf/issues/9

wouterdebie commented 10 years ago

@ravwojdyla and myself did some more experimentation and it seems that google has to fix their protobuf for python3 before we can move on with this. We'll keep this issue open.

vmarkovtsev commented 9 years ago

OK, Google has fixed it. I ran 2to3 on all the sources and replaced str("", "utf-8") with bytes("", "utf-8") in protobuf and was able to successfully import snakebite. Not sure if it works correct though.

vmarkovtsev commented 9 years ago

Update: looks like protobuf defs should be updated, I am getting

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 140, in ls
    recurse=recurse):
  File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 1082, in _find_items
    fileinfo = self._get_file_info(path)
  File "/usr/local/lib/python3.4/dist-packages/snakebite/client.py", line 1214, in _get_file_info
    return self.service.getFileInfo(request)
  File "/usr/local/lib/python3.4/dist-packages/snakebite/service.py", line 35, in <lambda>
    rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
  File "/usr/local/lib/python3.4/dist-packages/snakebite/service.py", line 41, in call
    return method(self.service, controller, request)
  File "/usr/local/lib/python3.4/dist-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
    self._StubMethod(inst, method, rpc_controller, request, callback))
  File "/usr/local/lib/python3.4/dist-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
    method_descriptor.output_type._concrete_class, callback)
  File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 409, in CallMethod
    self.get_connection(self.host, self.port)
  File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 219, in get_connection
    rpc_header = self.create_rpc_request_header()
  File "/usr/local/lib/python3.4/dist-packages/snakebite/channel.py", line 248, in create_rpc_request_header
    rpcheader.clientId = self.client_id[0:16]
  File "/usr/local/lib/python3.4/dist-packages/google/protobuf/internal/python_message.py", line 471, in field_setter
    self._fields[field] = type_checker.CheckValue(new_value)
  File "/usr/local/lib/python3.4/dist-packages/google/protobuf/internal/type_checkers.py", line 103, in CheckValue
    raise TypeError(message)
TypeError: '35bc231f-07ae-41' has type <class 'str'>, but expected one of: (<class 'bytes'>,)
wouterdebie commented 9 years ago

We will probably will need to compile the protobufs with version 3.0.0 before it to work. Unfortunately proto3 is still in alpha and he proto3 compiler isn't mainstream yet. Given that we need to recompile the protobufs, I'm not sure how to proceed, since this would make a python3/proto3 version incompatible with python2/proto2 version. @ravwojdyla any idea? Should we maintain two different versions? Or have a bunch of conditionals based on proto2/3, python2/3 and in the future probably hadoop versions in code?

vmarkovtsev commented 9 years ago

No need to recompile. Just apply the fixing patch to 2.6.1: https://github.com/google/protobuf/pull/50

gpoulin commented 9 years ago

@wouterdebie any idea on how you plan to proceed? I started to port over spotify/luigi#747 and snakebite is currently a blocker. I had a fast look at snakebite code and I don't think supporting both python2/3 with one source code would be difficult. The main work would require to use unicode and bytes literal and encode and decode at the appropriate place (I can help for this part if you want). The tricky part would be the dependency on protobuf.

Tarrasch commented 9 years ago

@gpoulin can't you solve this on luigi side? Core luigi shouldn't really depend on snakebite.

wouterdebie commented 9 years ago

@gpoulin would be great to see some progress here. I would like to have one version that supports both py2/3. And yes, the protobuf stuff we'll have to solve, but I don't mind keeping two different compiled sets for that.

gpoulin commented 9 years ago

@Tarrasch I was under the impression that it was a soft requirement for Luigi, but hard requirement for "Hadoop Luigi", since all the Hadoop tests were failing on import snakebite. However having a closer look it seems only to had more functionality if present. Anyway, having snakebite running on python 3 would still be great.

Tarrasch commented 9 years ago

@gpoulin Exactly, it's basically that luigi will run faster with snakebite because it don't need to shell out to hadoopcli. :)

Tarrasch commented 8 years ago

@talumbau @gpoulin @vmarkovtsev @wouterdebie. Do you guys have any updates on the status for the python 3 support?

gpoulin commented 8 years ago

@Tarrasch Even thou I'm currently working more in scala, I'm willing to help here. However, my understanding is that the version of protobuf that snakebite is currently using doesn't support python3. So if snakebite can support a protobuf version that support python3, I can help to make this project python3 compatible.

Tarrasch commented 8 years ago

@gpoulin, I did try to scan the status for protobuf and python3 in general, but it doesn't seem to be well supported if at all.

Since I can not wait for a way to do hdfs fast in python3, I'll probably switch over to webhdfs.

Tarrasch commented 8 years ago

For luigi users who are fine with using webhdfs, this could be relvant spotify/luigi#1438

krzysztof-indyk commented 8 years ago

@gpoulin protobuf compatible with python3 has been relased https://pypi.python.org/pypi/protobuf/3.0.0b2

kirklg commented 8 years ago

Hey all, first of all, great work, this project is exactly what I need for a hadoop project I'm working on, and speed is definitely important (py3 more specifically).

Anyways, I'm going to take a stab at this, python-protobuf3 looks stable and my initial py3 compat work looks promising. I will update when I have more, thank you.

anneschuth commented 8 years ago

@kirklg awesome! Did you make any progress?

kirklg commented 8 years ago

@anneschuth, yes, but its been slow moving, I have my fork, https://github.com/kirklg/snakebite/pull/2, that you can watch if you want to subscribe to progress :). I have a string of commits coming soon when I find the time, thanks.

anneschuth commented 8 years ago

Thanks @kirklg, looking forward to your commits!

FlorianWilhelm commented 8 years ago

@kirklg Any news on your PR? I am also looking forward to snakebite with Python 3 support. Thanks for making this possible.

sumyeon commented 7 years ago

Hey, are there any news on support for Python 3? I hope to heare any news, even if it is negative.

wouterdebie commented 7 years ago

Given that at Spotify we don't use python3 there is no incentive to work on this. I hope someone else from the community takes a stab at this, but it's hard to say it that will happen.

On Feb 15, 2017 9:10 AM, "sumyeon" notifications@github.com wrote:

Hey, are there any news on support for Python 3? I hope to heare any news, even if it is negative.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spotify/snakebite/issues/62#issuecomment-279943784, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKgBtIdHP1B6jCICDhOa286iaoYBrn-ks5rcrLdgaJpZM4CBlij .

rtets commented 7 years ago

Before I (very ambitiously) start from scratch on this, has anyone already done any analysis on it? or even had a crack?

ghost commented 7 years ago

@rtets Appears to have been attempted in the past, and subsequently obliterated e.g. https://github.com/spotify/snakebite/commit/094e801e88f17dcaaac14fe2d98d72ce83de1b7f#diff-d9827893b8ca9a0f86a685a5fd2327b5

there's still some evidence of Python 3 compatibility e.g.

if sys.version_info[0] == 3:
    long = int

I fear that you'd make this effort, and then it wouldn't last long before it reverts to Python2 specific code again. Shame, given how easy it is to set up TOX et al to test multiple versions, and how easy 2to3 and six libraries make supporting both, and how much better Python 3.5+ is (everything is a generator / iterator, unicode actually works for global languages etc etc).

gpoulin commented 6 years ago

@davoscollective Those commit simply added compatibility for the minicluster which was needed for testing in Luigi. The actual client use protobuf 2 that doesn't support python3. So to upgrade to python 3 it will first require updating protobuf. Also this library do a lot of io on mixing byte and string which is the part that changed the most between python2 and python3 which make the compatibility not so trivial.

kamikaze commented 6 years ago

What a crap... Spotify, don't be lazy! Please make this library Python 3.6 compatible

kamikaze commented 6 years ago

https://github.com/spotify/snakebite/blob/master/snakebite/client.py#L1473

ghost commented 6 years ago

@kamikaze Almost python 3.7 now https://docs.python.org/3.7/whatsnew/3.7.html Then move away from protobuf and start using apache arrow. It's the __future__

cupdike commented 6 years ago

As long as snakebite's kerberos support relies on python-krbV: https://github.com/spotify/snakebite/blob/6a456e6100b0c1be66cc1f7f9d7f50494f369da3/setup.py#L49 it's not going to work with Python 3 on Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1024834#c4

OOPMan commented 6 years ago

Given that Python 2.x will hit EOL in just a few years and that this ticket has also been open for just a few years maybe there's some chance some poor soul will submit a PR.

Personally, the excuse that because Spotify don't use Python 3 they can't be bothered to do the work is a wearing a little thin...

wouterdebie commented 6 years ago

I left Spotify a while ago, but I know that Spotify doesn't run HDFS anymore. They've moved to Google Cloud Platform where everything is stored in GCS and doesn't require the hadoop client or snakebite anymore. I guess someone at Spotify should communicate this and deprecate snakebite. As the original author, I've moved on to other things as well and don't have an incentive to maintain the project either. Since I wrote this during work hours, Spotify is the owner of snakebite and it's up to them to decide what to do with it.

kirill-krokodil commented 5 years ago

@wouterdebie thanks for highlighting the issue (and for the client too, of course). so, this repo is dead, and nobody (Spotify or whoever admins repo) gonna merge this pull request? looks like @cclauss did a great job, hard to believe, that it will be lost for the community.

wouterdebie commented 5 years ago

I'm really sorry about this. I would love to have merged this, but I don't have write access anymore. If the community wants to continue snakebite it can always be forked. @ravwojdyla also left Spotify, but he might have an idea who to talk to so the project can be owned by the community again.

jkafader commented 5 years ago

at internet archive, we use this library a lot, use python3 and have been using a fork of kirklg's version for some time, we finally put it on pypi recently: https://pypi.org/project/snakebite-py3/

jkafader commented 5 years ago

If anyone would like to merge in more recent changes and help us to get the tests to pass, the work would be much appreciated.

jkafader commented 5 years ago

thanks for all of your past work @kirklg

kirklg commented 5 years ago

Hey @jkafader kudos! I can't take any credit here, thanks for moving the pythons to 3 :D.

My priorities shifted and I could no longer get paid to work on this.

wouterdebie commented 5 years ago

@jkafader I just saw this thread from a month ago and thank you so much for doing this! I reached out to some folks at Spotify to do something with this repo, but I haven't heard back. Who knows..