Closed mrocklin closed 10 years ago
I'm mostly scared from my experience with sympy
In [1]: import sympy
sympy
In [2]: sympy.<tab>
Display all 750 possibilities? (y or n)
There really isn't any way to effectively navigate the API. As a result most questions to the mailing list are "how do I do X" and most responses are "try sympy.Y". Having lots of functions in a sea quashes exploration.
Right now in toolz I get the following
In [4]: toolz.<tab>
toolz.accumulate toolz.groupby toolz.nth
toolz.assoc toolz.identity toolz.partial
toolz.comp toolz.interleave toolz.partitionby
toolz.compatibility toolz.intersection toolz.pipe
toolz.compose toolz.isdistinct toolz.reduce
toolz.concat toolz.isiterable toolz.reduceby
toolz.concatv toolz.iterate toolz.remove
toolz.countby toolz.itertoolz toolz.rest
toolz.curry toolz.keymap toolz.second
toolz.dicttoolz toolz.last toolz.sliding_window
toolz.drop toolz.map toolz.take
toolz.filter toolz.mapcat toolz.thread_first
toolz.first toolz.memoize toolz.thread_last
toolz.frequencies toolz.merge toolz.unique
toolz.functoolz toolz.merge_sorted toolz.update_in
toolz.get toolz.merge_with toolz.valmap
I like this. A new user can fit this in their head and that's good. I wouldn't want to see it grow to be more than twice this size. Even looking at this list I'm curious about what actually gets used and what could be moved elsewhere.
The problem with grouping is that it makes importing a pain. I really like being able to type from toolz import *
. This sounds silly but this is really much nicer than before when I did from itertoolz import *
and from functoolz import *
.
At the same time I'd like to include some random functions, like a pure version of random.shuffle
and I'm sure others will have other desires for functions that they use often (see #66). It would be nice to provide a space for these functions without blowing up the API.
To this extent I think that some modules may be helpful. For right now I have in my mind random
and sandbox
. Sandbox would be about lowering the barrier to entry while still not promising perpetual support.
At this stage I think it's important to accept contributions. Engaging new contributors is probably more important than accumulating functionality. Recent energy provided by @eriknw has proved immeasurably valuable. If someone like him came around with a function they wanted to add we should add it, just to get them on board.
This is tricky. I want to make a claim like "general applicability" across domains. For example we would exclude a function for parsing astronomy files, pointing them to astropy or whatnot. We might also exclude "anything big". I'm not sure where to put constraints here though.
Isn’t Toolz still quite a bit smaller than e.g. Underscore?
It certainly is way smaller than clojure.core.
Both projects’ namespaces are flat, IIRC.
I think I still favor a flat namespace for toolz, but Matt and I have already batted this ball around and I’d be curious what others think.
On Nov 5, 2013, at 10:38 AM, Matthew Rocklin notifications@github.com wrote:
What is the scope of toolz? How do we organize toolz functions into cohesive groups?
Grouping: One extreme is to have lots of functions swimming around in a common namespace (from toolz import ). Another is to rigidly structure the namespaces so that few functions are within each (from toolz.itertoolz.random import ). A hybrid is to structure functions into namespaces but then have a public top-level namespace of very commonly used functions.
Scope: At what point do we say, "this function should be in a different package"? One extreme is to accept any pure-python function that anyone deems useful. The other is to restrict ourselves to an exclusive orthogonal set.
Toolz is now growing at a rate where I would like to develop some kind of guideline. We don't want to introduce an API that we don't want to maintain into the future. I believe that these two issues are related because we can ease the problems of a large scope with organization.
— Reply to this email directly or view it on GitHub.
Yes, toolz is smaller than underscore which has an API of size about 100.
What are your thoughts on random
and sandbox
modules John? Also, can you hypothesize in what other directions people might want to go? What are some example functions that should just barely not be part of toolz? What functions outline a boundary?
In short, what are your answers to the questions of scope and grouping. I have principles about these topics in the abstract. I don't think I have very solidified stances on them in concrete applications (applications like toolz
).
I think submodules for specialized cases is fine and makes sense. It makes more sense to me than the division into dicttoolz, itertoolz and functoolz, particularly since you oppose cross-module imports.
I agree w/ what you write about scope, particularly general applicability. Your tutorial-in-progress is pointing in this direction already. We aim to provide tools to manipulate standard data structures, mostly lazily, in composable ways. Things that are too specialized should perhaps be additional modules with toolz as a dependency.
I'm open to tearing down the itertoolz/functoolz/dictoolz
scaffolding. I'm a little afraid of a giant core.py and would support some scheme of organization; I agree thought that the iter/func/dict scheme doesn't feel perfect. It's mostly there for historical reasons.
Thanks for the shout-out!
I'm definitely on the same page as you guys, and, as such, I don't have a whole lot to add without giving things sufficient consideration. I like the idea of a sandbox
that lowers the barrier of entry and can serve as an incubator, but has no guarantees for API consistency whatsoever. This is something I've been thinking about too.
I'll give an example of something that is probably near, but currently outside the boundary of toolz: ewma
, an estimated weighted moving average. Ignore for the moment that this can be accomplished with itertoolz.accumulate
(perhaps add as a recipe or example?). An ewma
is pure, lazy, and composable, so prima facie it is worth considering to add to toolz
. It is different, though, because generically it is more of an end-product of an analysis than a building block. But, this isn't true if you do signal analysis and build more complicated filters that include an ewma
. Hence, this would best belong in a signal
namespace. Signals are a perfect source of infinite iterables. However, should signaltoolz
be a submodule or a separate package? I don't know, and I contend it doesn't matter for the purpose of adding it to sandbox
. If enough signalling tools are added to sandbox
that it is clear a lightweight, coherent set of building blocks for signal analysis can be made into toolz.signaltoolz
, then so be it. sandbox
would have served as a staging area. If, however, it is decided that the signal tools should belong in a separate package, then so be it. sandbox
would have served as an incubator that allowed input and contributions from this (hopefully growing) community before the creation of a new package. If we had first said "ewma belongs in a separate package", then that package probably would never have been created.
What is the scope of toolz? How do we organize toolz functions into cohesive groups?
Grouping: One extreme is to have lots of functions swimming around in a common namespace (
from toolz import *
). Another is to rigidly structure the namespaces so that few functions are within each (from toolz.itertoolz.random import *
). A hybrid is to structure functions into namespaces but then have a public top-level namespace of very commonly used functions.Scope: At what point do we say, "this function should be in a different package"? One extreme is to accept any pure-python function that anyone deems useful. The other is to restrict ourselves to an exclusive orthogonal set.
Toolz is now growing at a rate where I would like to develop some kind of guideline. We don't want to introduce an API that we don't want to maintain into the future. I believe that these two issues are related because we can ease the problems of a large scope with organization.