visualfabriq / bquery

A query and aggregation framework for Bcolz (W2013-01)
https://www.visualfabriq.com
BSD 3-Clause "New" or "Revised" License
56 stars 11 forks source link

Documentation updates #48

Closed waylonflinn closed 9 years ago

waylonflinn commented 9 years ago

I'm creating a PR for adding calculation of mean and standard deviation. This PR contains documentation updates and variable renames to make that process easier.

CarstVaartjes commented 9 years ago

Oooh nice; the integration fails because of this btw: https://github.com/Blosc/bcolz/issues/251 if the tests run on your computer, it's fine ;)

CarstVaartjes commented 9 years ago

One small other remark; the way in which it's currently setup was not exactly how I meant it to be. The idea was this (as in the documentation on the main page):

# groupby column f0, perform a sum on column f2 and keep the output column with the same name
ct.groupby(['f0'], ['f2'])

# groupby column f0, perform a sum on column f2 and rename the output column to f2_sum
ct.groupby(['f0'], [['f2', 'f2_sum']])

# groupby column f0, with a sum on f2 ('f2_sum') and a sum_na on f2 ('f2_sum_na')
ct.groupby(['f0'], [['f2', 'f2_sum', 'sum'], ['f2', 'f2_sum_na', 'sum_na']])

So the idea is:

So with the third type you can do multiple aggregations on one column (sum, mean, max, etc) all in one go. But the implementation is not nice like this yet :/ Also: if you find this not a very nice method, i'm completely open to other suggestions. The idea's based a bit on classical sql groupby options

waylonflinn commented 9 years ago

Thanks for the heads up regarding the install bug!

CarstVaartjes commented 9 years ago

I'm making a small change to make sure it works as it should with aggregation and output columns + adding something to your documentation (filling in the blanks that I can understand mystified you hehe ;)