mozilla / jydoop

Efficient Hadoop Map-Reduce in Python
Other
31 stars 19 forks source link

Don't overwrite json package #25

Closed indygreg closed 11 years ago

indygreg commented 11 years ago

While the streaming Jackson API is much faster than the built-in json module, I think it is a bad idea to overwrite the built-in json module with the monkeypatched functions out of principle. There are legitimate cases where someone may want to utilize the additional features of the built-in json module APIs. Using Jackson shouldn't preclude this from occurring.

I think the Jackson JSON API should be exposed under say "import jacksonjson" or similar.

tarasglek commented 11 years ago

No.

jython ships with a lot of python libraries, most are implemented in python. Unfortunately python is damn slow when not using native extensions, jython is even slower. Benjamin wasted a couple of hours trying to figure out why jobs using native jython json would hang, turned out they were so slow the job tracker gave up on them.

indygreg commented 11 years ago

This has nothing to do with jython's json module being slow: it's all about removing a core module from the execution environment. http://docs.python.org/2/library/json.html has a number of handy APIs that jydoop's json module does not. For example, dumps() and the ability to decode classes from JSON.

It's quite conceivable a job may want to employ one of these (albeit slower) APIs. However, because you have overwritten the built-in module, it's impossible.

FWIW, I've already heard at least one other person gripe about this. I think taking away a tool is a bad idea and the decision should be reconsidered.

tarasglek commented 11 years ago

The solution here is to add missing functionality to jydoop.

indygreg commented 11 years ago

Here is the reference implementation of the json package for whoever wants to reinvent the wheel: http://hg.python.org/cpython/file/a26df2d03989/Lib/json