pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.64k stars 258 forks source link

Scope and namespaces #67

Closed mrocklin closed 10 years ago

mrocklin commented 10 years ago

What is the scope of toolz? How do we organize toolz functions into cohesive groups?

Grouping: One extreme is to have lots of functions swimming around in a common namespace (from toolz import *). Another is to rigidly structure the namespaces so that few functions are within each (from toolz.itertoolz.random import *). A hybrid is to structure functions into namespaces but then have a public top-level namespace of very commonly used functions.

Scope: At what point do we say, "this function should be in a different package"? One extreme is to accept any pure-python function that anyone deems useful. The other is to restrict ourselves to an exclusive orthogonal set.

Toolz is now growing at a rate where I would like to develop some kind of guideline. We don't want to introduce an API that we don't want to maintain into the future. I believe that these two issues are related because we can ease the problems of a large scope with organization.

mrocklin commented 10 years ago

Subjective thoughts

I'm mostly scared from my experience with sympy

In [1]: import sympy
sympy
In [2]: sympy.<tab>
Display all 750 possibilities? (y or n)

There really isn't any way to effectively navigate the API. As a result most questions to the mailing list are "how do I do X" and most responses are "try sympy.Y". Having lots of functions in a sea quashes exploration.

Right now in toolz I get the following

In [4]: toolz.<tab>
toolz.accumulate      toolz.groupby         toolz.nth
toolz.assoc           toolz.identity        toolz.partial
toolz.comp            toolz.interleave      toolz.partitionby
toolz.compatibility   toolz.intersection    toolz.pipe
toolz.compose         toolz.isdistinct      toolz.reduce
toolz.concat          toolz.isiterable      toolz.reduceby
toolz.concatv         toolz.iterate         toolz.remove
toolz.countby         toolz.itertoolz       toolz.rest
toolz.curry           toolz.keymap          toolz.second
toolz.dicttoolz       toolz.last            toolz.sliding_window
toolz.drop            toolz.map             toolz.take
toolz.filter          toolz.mapcat          toolz.thread_first
toolz.first           toolz.memoize         toolz.thread_last
toolz.frequencies     toolz.merge           toolz.unique
toolz.functoolz       toolz.merge_sorted    toolz.update_in
toolz.get             toolz.merge_with      toolz.valmap

I like this. A new user can fit this in their head and that's good. I wouldn't want to see it grow to be more than twice this size. Even looking at this list I'm curious about what actually gets used and what could be moved elsewhere.

Grouping

The problem with grouping is that it makes importing a pain. I really like being able to type from toolz import *. This sounds silly but this is really much nicer than before when I did from itertoolz import * and from functoolz import *.

At the same time I'd like to include some random functions, like a pure version of random.shuffle and I'm sure others will have other desires for functions that they use often (see #66). It would be nice to provide a space for these functions without blowing up the API.

To this extent I think that some modules may be helpful. For right now I have in my mind random and sandbox. Sandbox would be about lowering the barrier to entry while still not promising perpetual support.

Community

At this stage I think it's important to accept contributions. Engaging new contributors is probably more important than accumulating functionality. Recent energy provided by @eriknw has proved immeasurably valuable. If someone like him came around with a function they wanted to add we should add it, just to get them on board.

Scope

This is tricky. I want to make a claim like "general applicability" across domains. For example we would exclude a function for parsing astronomy files, pointing them to astropy or whatnot. We might also exclude "anything big". I'm not sure where to put constraints here though.

eigenhombre commented 10 years ago

Isn’t Toolz still quite a bit smaller than e.g. Underscore?

It certainly is way smaller than clojure.core.

Both projects’ namespaces are flat, IIRC.

I think I still favor a flat namespace for toolz, but Matt and I have already batted this ball around and I’d be curious what others think.

On Nov 5, 2013, at 10:38 AM, Matthew Rocklin notifications@github.com wrote:

What is the scope of toolz? How do we organize toolz functions into cohesive groups?

Grouping: One extreme is to have lots of functions swimming around in a common namespace (from toolz import ). Another is to rigidly structure the namespaces so that few functions are within each (from toolz.itertoolz.random import ). A hybrid is to structure functions into namespaces but then have a public top-level namespace of very commonly used functions.

Scope: At what point do we say, "this function should be in a different package"? One extreme is to accept any pure-python function that anyone deems useful. The other is to restrict ourselves to an exclusive orthogonal set.

Toolz is now growing at a rate where I would like to develop some kind of guideline. We don't want to introduce an API that we don't want to maintain into the future. I believe that these two issues are related because we can ease the problems of a large scope with organization.

— Reply to this email directly or view it on GitHub.

mrocklin commented 10 years ago

Yes, toolz is smaller than underscore which has an API of size about 100.

What are your thoughts on random and sandbox modules John? Also, can you hypothesize in what other directions people might want to go? What are some example functions that should just barely not be part of toolz? What functions outline a boundary?

In short, what are your answers to the questions of scope and grouping. I have principles about these topics in the abstract. I don't think I have very solidified stances on them in concrete applications (applications like toolz).

eigenhombre commented 10 years ago

I think submodules for specialized cases is fine and makes sense. It makes more sense to me than the division into dicttoolz, itertoolz and functoolz, particularly since you oppose cross-module imports.

I agree w/ what you write about scope, particularly general applicability. Your tutorial-in-progress is pointing in this direction already. We aim to provide tools to manipulate standard data structures, mostly lazily, in composable ways. Things that are too specialized should perhaps be additional modules with toolz as a dependency.

mrocklin commented 10 years ago

I'm open to tearing down the itertoolz/functoolz/dictoolz scaffolding. I'm a little afraid of a giant core.py and would support some scheme of organization; I agree thought that the iter/func/dict scheme doesn't feel perfect. It's mostly there for historical reasons.

eriknw commented 10 years ago

Thanks for the shout-out!

I'm definitely on the same page as you guys, and, as such, I don't have a whole lot to add without giving things sufficient consideration. I like the idea of a sandboxthat lowers the barrier of entry and can serve as an incubator, but has no guarantees for API consistency whatsoever. This is something I've been thinking about too.

I'll give an example of something that is probably near, but currently outside the boundary of toolz: ewma, an estimated weighted moving average. Ignore for the moment that this can be accomplished with itertoolz.accumulate (perhaps add as a recipe or example?). An ewma is pure, lazy, and composable, so prima facie it is worth considering to add to toolz. It is different, though, because generically it is more of an end-product of an analysis than a building block. But, this isn't true if you do signal analysis and build more complicated filters that include an ewma. Hence, this would best belong in a signal namespace. Signals are a perfect source of infinite iterables. However, should signaltoolz be a submodule or a separate package? I don't know, and I contend it doesn't matter for the purpose of adding it to sandbox. If enough signalling tools are added to sandbox that it is clear a lightweight, coherent set of building blocks for signal analysis can be made into toolz.signaltoolz, then so be it. sandbox would have served as a staging area. If, however, it is decided that the signal tools should belong in a separate package, then so be it. sandbox would have served as an incubator that allowed input and contributions from this (hopefully growing) community before the creation of a new package. If we had first said "ewma belongs in a separate package", then that package probably would never have been created.