linearregression / hypertable

Automatically exported from code.google.com/p/hypertable
GNU General Public License v2.0
0 stars 0 forks source link

Add support for linear stats #82

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The current scanning mechanism should probably also have support for
certain stats functions that can get applied to the scanned cells.

There are certain statistics functions that can get applied linearly to the
data in a table.  The execution of these functions should just get
piggybacked on the normal scan mechanims.  For example, let's say you
wanted to count the number of cell revisions for a column on a row-by-row
basis.  In this case you would want to produce output lines of the
following format:

<row_key> <column> <count>

This could be accomplished by associating a stats function with the scanner
in each RangeServer

Original issue reported on code.google.com by nuggetwh...@gmail.com on 10 Mar 2008 at 11:37

GoogleCodeExporter commented 9 years ago
Gordon - can you chime in on this one?  I think you would be the best person to 
spec
this one out.

Original comment by nuggetwh...@gmail.com on 18 Mar 2008 at 5:47

GoogleCodeExporter commented 9 years ago
As discussed with Doug, I would suggest copying SQL's "in" statement to help 
with
aggregating data from dimension tables based on FKs returned from a fact table 
(when
working in a star schema, for example).

So: SELECT cols FROM dimension WHERE col IN ([range of ids])

Eg: SELECT * FROM My_Dim_Tbl WHERE id IN 
(1,4,8,9,123,556,3232,454747,132346457);

IN expects a comma-seperated list of integers to test against Primary Keys.

A further enhancement would be to allow "ranges" to be passed to IN:
SELECT * FROM My_Dim_Tbl WHERE id IN (1..9,234..999);
to return records with IDs 1 through 9 inclusive and 234 through 999 inclusive. 
This
isn't required, but will make working with such queries on the command-line 
easier.

Original comment by phillip....@gmail.com on 5 Jun 2008 at 8:16

GoogleCodeExporter commented 9 years ago
Something along these lines could work -- for example, GQL has support for the 
IN
operator:

Here's a quote from their documentation:

"The IN operator compares value of a property to each item in a list. The IN 
operator
is equivalent to many = queries, one for each value, that are ORed together. An
entity whose value for the given property equals any of the values in the list 
can be
returned for the query."

http://code.google.com/appengine/docs/datastore/gqlreference.html

Original comment by gpar...@gmail.com on 5 Jun 2008 at 8:23

GoogleCodeExporter commented 9 years ago

Original comment by nuggetwh...@gmail.com on 6 Jun 2008 at 4:32

GoogleCodeExporter commented 9 years ago

Original comment by nuggetwh...@gmail.com on 11 Apr 2010 at 3:40