Cython implementation of toolz

eriknw commented 10 years ago

What do you think about having a Cython implementation of toolz that can be used as a regular C extension in CPython, or be cimport-ed by other Cython code?

I've been messing around with Cython lately, and I became curious how much performance could be gained by implementing toolz in Cython. I am almost finished with a first-pass implementation (it goes quickly when one doesn't try to fine-tune everything), and just have half of itertoolz left to do.

Performance increases of x2-x4 are common. Some perform even better (like x10), and a few are virtually the same. There is also less overhead when calling functions defined in Cython, which at times can be significant regardless of how things scale.

However, performance when called from Python isn't the only consideration. A common strategy used by the scientific, mobile, and game communities to increase performance of their applications is to convert Python code that is frequently run to Cython. Developing in Cython also tends to be very imperative. A Cython version of toolz will allow fast implementations to be used in other Cython code (via cimport) while facilitating a more functional style of programming.

Looking ahead, cython.parallel exposes OpenMP at a low level, which should allow for more efficient parallel processing.

Thoughts? Any ideas for a name? I am thinking coolz, because ctoolz and cytoolz sound like they are utilities for C or Cython code. I can push what I currently have to a repo once it has a name. Should this be part of pytoolz?

eriknw commented 10 years ago

To summarize my results with get:

single index:
- x2.4
single index returning default value:
- x6
short list of indices (length 2):
- x10
- less for long lists
short list indices all returning default values:
- x20
- more for long lists

Just to be clear "x1" would be the same execution time, and "x2" would be half the execution time.

mrocklin commented 10 years ago

Awesome. Though note that curry(get)(0) is only about 1.5 times faster than curry(get)(0) in toolz. The use of partial pays off.

eriknw commented 10 years ago

Awesome. Though note that curry(get)(0) is only about 1.5 times faster than curry(get)(0) in toolz. The use of partial pays off.

I'm not ready to concede the point of using partial in toolz.curry when the function may have keyword arguments. curry is not the same as partial. When there are only positional arguments, curry has well-defined semantics: f(x, y, z) can be incrementally evaluated as f(1)(2)(3). If there are both positional arguments and keyword arguments, the semantics become less obvious. I would expect curry to allow incremental additions (and changes) to keyword arguments as follows: f(x, y=None, z=None) can be incrementally evaluated as f(y=1)(z=2)(y=3)(4). For example:

# we plan to use `memoize` with this key function many places
memoize_onfirst = memoize(key=lambda *args, **kwargs: args[0])

f_cache = {}
@memoize_onfirst(cache=f_cache)
def f(x, y, z):
    ...

Such usage is incompatible with returning a partial object for get (because get has a keyword argument). Moreover, if you want almost all of the convenience of using and typing get and the speed of partial, you can use this one-liner: pget = lambda *args, **kwargs: partial(get, *args, **kwargs).

By the way, I had to make cytoolz.curry a little slower for a while in order to pass tests (have I mentioned how much I love having pre-existing tests? Well, I do!). I now understand the issue I was having, and cytoolz.curry is once again nearly as fast as partial. For me curry(get)(0) in cytoolz is now 1.9 time faster than curry(get)(0) in toolz. Somehow, I think this looks good for both toolz and cytoolz (even though toolz needed to make a compromise to gain performance)!

mrocklin commented 10 years ago

I agree with you about the issue with partial, we do sacrifice correctness here. I've optimized pretty heavily for performance in the past. My only defense for this strategy is that correctness issues haven't come up.

Maybe with cytoolz around though this can change. Maybe instead of rarely sacrificing correctness we now optionally sacrifice the pure-python bit. This is nice because the user can make this choice himself (by import cytoolz rather than toolz) rather than us making the curry/partial decision for him.

mrocklin commented 10 years ago

I plan to talk about functional programming and core data structures at PyData Silicon Valley, May 2-4 and I'd like to show off CyToolz. It would be nice to release a blogpost or two about it beforehand. Are you working on a timeline of any sort?

eriknw commented 10 years ago

Oh, cool. I'll be interested to hear how it goes.

I expect to be very busy the last two weeks of April, so I plan to get cytoolz into a beta state within a week. I think this is doable. "beta" to me means full coverage of toolz API, pip-installable on Linux (and possibly OS X) with and without Cython, and at least a little documentation. Your engagement is appreciated. I was initially thinking early May for my blog post. Hopefully I can whip it out sooner, but reaching beta status is my main priority.

eriknw commented 10 years ago

@mrocklin, you may be interested that I ran CyToolz versions of the "Text Benchmarks" from here:

http://matthewrocklin.com/blog/work/2014/01/13/Text-Benchmarks/

I made one modification for the CyToolz versions: I used imap (actually cytoolz.compatibility.map), which was actually faster than using map.

The straight CyToolz version from Python is about 45% faster than with Toolz (i.e., I just changed import from toolz to cytoolz). For kicks, I ran the same thing in a Cython-compiled module that exposes a function that does the processing. This ran about 65% faster than with Toolz.

mrocklin commented 10 years ago

That's awesome. Python, now beating Julia and Java on data structure processing.

mrocklin commented 10 years ago

I mentioned cytoolz during a Blaze meeting (I now work on Blaze). @FrancescAlted seemed pretty interested.

mrocklin commented 10 years ago

BTW, numpy.bincount beats both cytoolz.frequencies and pandas.groupby(...).size()

eriknw commented 10 years ago

I mentioned cytoolz during a Blaze meeting (I now work on Blaze). @FrancescAlted seemed pretty interested.

Awesome! Any help, feedback, and additional expertise in Cython is appreciated.

I like what I've seen of Blaze, but I've never had a reason to learn it or chance to use it (but nor have I worked on "big data" appropriate for Blaze since its existence). Is it used much outside of Continuum's customer base?

I like the size and scope of toolz. Essentially, it is my sine qua non for functional data handling and analysis, which can be seamlessly integrated with other Python code.

eriknw commented 10 years ago

BTW, numpy.bincount beats both cytoolz.frequencies and pandas.groupby(...).size()

Yeah, but who knows about and uses numpy.bincount?! I'm not terribly surprised by this result. There is a small performance penalty for handling generic, streaming data lazily. Can numpy.bincount be lazy? Also, how much faster is it?

mrocklin commented 10 years ago

It is significantly faster (50x). But no to laziness and no to generic types. Don't get me wrong, I always reach for toolz first, this is mostly because it matches my mental programming model more closely and because it is fast enough. Things like cytoolz I mostly see as enabling me to stay within that mental model even under more strict performance requirements.

The excitement with cytoolz.frequencies competing with pandas was that we thought it might be possible, in some cases, that we could reach all the performance that can be reasonably expected from a single core while staying within the functional model. At least for this application numpy.bincount does provide a compelling counter example.

eriknw commented 10 years ago

All good points. I wonder if pandas could take advantage of numpy.bincount. If not generally, what about under certain circumstances?

eriknw commented 10 years ago

cytoolz now matches the toolz API! (minus the sandbox)

mrocklin commented 10 years ago

Woot!

On Fri, Apr 11, 2014 at 11:04 AM, Erik Welch notifications@github.comwrote:

cytoolz now matches the toolz API! (minus the sandbox)

Reply to this email directly or view it on GitHubhttps://github.com/pytoolz/toolz/issues/155#issuecomment-40235907 .

eriknw commented 10 years ago

Another positive outcome of this effort: there are a few faster implementations that could be back-ported to toolz. I don't recall which ones, though; my head is in a bit of a whirlwind. Heh, the final 10 functions added to cytoolz seemed to take as long to develop as the first 40!

I'm going to spend a few days away from cytoolz, but, if you'd like, I encourage you to take charge of testing, developing, documenting, and playing around with it and reporting any problems or annoyances.

mrocklin commented 10 years ago

Blaze has me pretty hard right now, sorry that I haven't been as responsive. The top of my cytoolz priority list is to write a blogpost and a PyToolz doc page. Maybe we can get other people on board to help with the other things?

mrocklin commented 10 years ago

This is really fantastic work by the way, I know I've said that before, but it should really probably be said every day or two.

eriknw commented 10 years ago

Blaze has me pretty hard right now, sorry that I haven't been as responsive.

New job, right? Obviously top priority.

The top of my cytoolz priority list is to write a blogpost and a PyToolz doc page.

Perfect!

This is really fantastic work by the way, I know I've said that before, but it should really probably be said every day or two.

Thanks, I'm glad you think so and said so!

This began as a way to learn and explore Cython with non-trivial tasks that didn't match the given examples. toolz was a great candidate for many reasons: each function is pure and mostly independent (hence, easy to implement piece-by-piece), the API is already defined, tests already exist(!), there are a wide variety of functions, and the functions don't conform to typical Cython examples. I had no idea whether it would be worth creating an entire package--cytoolz--but I was definitely curious how the performance would compare to toolz, which served as a motivator. Even though becoming more familiar with Cython was my first priority, I thought this could turn into a separate package if it were feasible to do and the performance justified it. Even if it were those things, I am certain I wouldn't have developed cytoolz if not for your enthusiasm and support. Thanks!

toolz is a great package on its own merits, and cytoolz merely adds to its value. It may seem odd that some Python users care so much about performance, but the truth is that many users do (especially in the scientific community). I think that just the existence of cytoolz--that a higher performance alternative is available--will make some people more comfortable with using toolz. Hopefully*toolz gets more coverage and traction among users this year--let's make it so! There is a niche to be filled. (Alright, I've made a decision: I'm going to try to present toolz at PyOhio this year)

eriknw commented 10 years ago

I think we can close this now that CyToolz has been released. Are there any lingering issues from our discussions here?

eriknw commented 10 years ago

I'm closing this issue. cytoolz is well-established, and it has successfully been updated and maintained for a release cycle, which didn't turn out to be too painful due to automated tests that verify consistency between toolz and cytoolz. I also use a simple script to copy tests over from toolz, so this isn't a big deal either so long tests get back-ported to toolz (which was also done this release cycle).

The discussion in the PR remains an interesting read, but there isn't much to add, nor is this the proper place to continue the discussion. Closing.

pytoolz / toolz

Cython implementation of toolz #155