Documentation updates - Githubissues

waylonflinn commented 9 years ago

I'm creating a PR for adding calculation of mean and standard deviation. This PR contains documentation updates and variable renames to make that process easier.

CarstVaartjes commented 9 years ago

Oooh nice; the integration fails because of this btw: https://github.com/Blosc/bcolz/issues/251 if the tests run on your computer, it's fine ;)

CarstVaartjes commented 9 years ago

One small other remark; the way in which it's currently setup was not exactly how I meant it to be. The idea was this (as in the documentation on the main page):

# groupby column f0, perform a sum on column f2 and keep the output column with the same name
ct.groupby(['f0'], ['f2'])

# groupby column f0, perform a sum on column f2 and rename the output column to f2_sum
ct.groupby(['f0'], [['f2', 'f2_sum']])

# groupby column f0, with a sum on f2 ('f2_sum') and a sum_na on f2 ('f2_sum_na')
ct.groupby(['f0'], [['f2', 'f2_sum', 'sum'], ['f2', 'f2_sum_na', 'sum_na']])

So the idea is:

a list of groupby columns + a list of columns that need to be aggregated
if the list of aggregation columns is a plain list, we assume that they all will be summed (the default) and that the name of the output columns are the same
if it's a list of 2 tuples, we assume that the first field in the tuple contains the input name and the second one contains the aggregation type (sum, mean, etc)
if it's a list of 3 tuples, we assume that the first field in the tuple contains the input name, the second the output name and the third one contains the aggregation type (sum, mean, etc)

So with the third type you can do multiple aggregations on one column (sum, mean, max, etc) all in one go. But the implementation is not nice like this yet :/ Also: if you find this not a very nice method, i'm completely open to other suggestions. The idea's based a bit on classical sql groupby options

waylonflinn commented 9 years ago

Thanks for the heads up regarding the install bug!

CarstVaartjes commented 9 years ago

I'm making a small change to make sure it works as it should with aggregation and output columns + adding something to your documentation (filling in the blanks that I can understand mystified you hehe ;)

visualfabriq / bquery

Documentation updates #48