sanchezzzhak / kak-clickhouse

Yii2 ext. ClickHouse
69 stars 43 forks source link

"with totals" and rows_before_limit_at_least implementation #10

Closed sleptor closed 7 years ago

sleptor commented 7 years ago

Let's imagine you need to show some stats in a table. The last row of the table must contain Total values. The table must have pagination as well.

Clickhouse allows to fetch all needed data in one query. for example

SELECT
    idsite,
    sum(impressions) AS impressions
FROM stat
GROUP BY idsite
WITH TOTALS
LIMIT 10
FORMAT JSON

as result you will get this json: ... "totals": { "idsite": 0, "impressions": "73886972" }, "rows": 10, "rows_before_limit_at_least": 226, ...

Consider to implement these very useful features.

sanchezzzhak commented 7 years ago

done;

$sql = 'SELECT
    idsite,
    sum(impressions) AS impressions
FROM stat
GROUP BY idsite
WITH TOTALS
LIMIT 10';  
    $command = $clickhouse->createCommand($sql);    
    $result  = $command->queryAll();

    var_dump($command->meta());     // columns meta info array (columnName, dataType)
    var_dump($command->totals());       // result WITH TOTALS
    var_dump($command->data());     // get rows data
    var_dump($command->rows());     // rows count current result
    var_dump($command->countAll());    // rows count before limit at least, easy pagination
    var_dump($command->extremes());  // extremes values 
    var_dump($command->statistics());  // stat query 

or

//...
 $command = $clickhouse->createCommand($sql);  
 $result = $command->data();
 $total  = $command->countAll();
sleptor commented 7 years ago

great job men!

  1. some points about design. you've mixed "getter" and "execute" functionality. it's not the best practice in my opinion. a weak point of your approach is the getters may execute query implicitly but should only return values related to a query that was executed earlier. for example, meta() method could be rewritten as:

    public function rows()
    {
     $this->ensureQueryExecuted();
     return $this->_rows;
    }
    public function ensureQueryExecuted()
    {
      if(true !== $this->_is_result) {
         throw new DbException('Query was not executed yet');
      }
    }

    it's just my opinion we can discuss it:)

  2. QueryBuilder could be improved to allow this syntax: (new Query)->from('stat')->withTotals()->all()

because currently, we cannot use query builder with "with totals" syntax at all. we must write raw sql :(

sanchezzzhak commented 7 years ago
$q = (new \kak\clickhouse\Query())->from('stat')
    ->withTotals()
    ->where(['event_date' => '2017-05-01' , 'user_id' => 5 ])
    ->offset(2)
    ->limit(1);

$command = $q->createCommand();
$result  = $command->queryAll();
$total   = $command->countAll();

var_dump($result);     // result data
var_dump($total);      // result WITH TOTALS

// -----
// all array result data meta totlas etc...
$result = (new \kak\clickhouse\Query())
    ->from('stat')
    ->withTotals()
    ->all();
var_dump($result);
sleptor commented 7 years ago

checked latest "fix; re" commit. you've refactored the code and I guess had broken Query::all() standard interface. This functions must return an array of rows. All developers expect this behaviour. but now, if I add withTotals then Query::all() and one() will return array of arrays with data, totals & etc. the code will be incompatible with others.

one of the ways to fix it is add an additional parameter to withTotal($returnMetaData=false) by default it's false, it means all() and one() will have standard behaviour.

but a better way is to keep withTotal() as it is. but create additional function withMetaData()

(new Query()) ->withTotal() ->withMetaData() ->all() in this case data will be included to results array

(new Query()) ->withTotal() ->all()

in this case data will NOT be included to results array

sanchezzzhak commented 7 years ago
(new Query())
->withTotal()
->withMetaData()
->all()

result all data current verson

Array
(
    [meta] => Array
        (
            [0] => Array
                (
                    [name] => user_id
                    [type] => Int32
                )

        )

    [data] => Array
        (
            [0] => Array
                (
                    [user_id] => 2
                )

            [1] => Array
                (
                    [user_id] => 2
                )

            [2] => Array
                (
                    [user_id] => 2
                )

            [3] => Array
                (
                    [user_id] => 2
                )

            [4] => Array
                (
                    [user_id] => 2
                )

        )

    [rows] => 5
    [countAll] => 39
    [totals] => Array
        (
            [user_id] => 0
        )

    [statistics] => Array
        (
            [elapsed] => 8.1887E-5
            [rows_read] => 39
            [bytes_read] => 156
        )

    [extremes] => 
)

is use only withTotal

(new Query())
->withTotal(false)
->all()

result

[totals] => Array  (
    [user_id] => 0
 )

So?

sleptor commented 7 years ago

image

I guess the vars

public $withTotals = false;
public $withMetaData = false;

must be private. UPD tried to change a visibility of those vars to private or protected.. something went wrong :( got error

sleptor commented 7 years ago

About ->withTotal() and ->withMetaData() again. Some misunderstanding here, probably I elaborated my idea not enough. The current behavior is still incorrect in my opinion.

adding ->withTotal() to a query must NOT change a result format. Currently, you must execute 2 queries: 1st to fetch rows, 2nd to get totals but only one query is really needed.

for example.

$db = Yii::$app->clickhouse;
$cmd = (new Query())
            ->select(['iddevice', new Expression('sum(impressions) as impressions')])
            ->from('stat')
            ->groupBy('iddevice');
$res = $cmd->all($db);

it returns rows

array(9) {
  [0]=>
  array(2) {
    ["iddevice"]=>
    int(0)
    ["impressions"]=>
    string(7) "4220435"
  }
  [1]=>
  array(2) {
    ["iddevice"]=>
    int(4)
    ["impressions"]=>
    string(1) "5"
  }

query with totals

$cmd = (new Query())
            ->select(['iddevice', new Expression('sum(impressions) as impressions')])
            ->from('stat')
            ->groupBy('iddevice')
            ->withTotals();
$res = $cmd->all($db);

it must return the same results

array(9) {
  [0]=>
  array(2) {
    ["iddevice"]=>
    int(0)
    ["impressions"]=>
    string(7) "4220435"
  }
  [1]=>
  array(2) {
    ["iddevice"]=>
    int(4)
    ["impressions"]=>
    string(1) "5"
  }  
...

the difference is only in generated SQL. in the 1st case sql-query was generated without "with totals" in the 2nd case sql contains "with totals".

After 1st query $cmd->totals() should return null, but after 2nd query $cmd->totals() should return:

array(2) {
  ["iddevice"]=>
  int(0)
  ["impressions"]=>
  string(8) "59938864"
}

about "->withMetaData()"... maybe remove it at all. or rename somehow. anyway all data are available via special getters meta(), totals() and etc. it's syntax sugar.

sanchezzzhak commented 7 years ago

fixed

     $cmd = (new Query())
            ->select(['iddevice, sum(impressions) as impressions'])
                ->from('stat')
                ->withTotals()
                ->groupBy('iddevice');

        $result = $cmd->all();

        echo "=== result\n";
        var_dump($result);
        echo "=== meta\n";
        var_dump($cmd->meta());
        echo "=== extremes\n";
        var_dump($cmd->extremes());
        echo "=== totals\n";
        var_dump($cmd->totals());
        echo "=== rows\n";
        var_dump($cmd->rows());
        echo "=== countAll\n";
        var_dump($cmd->countAll());
sleptor commented 7 years ago

almost perfect :)

I guess we should rename functions to getMeta(), getTotals() & etc because they are definitely getters. yii2 magic will be applied. we will be able to call it either as a function $this->getMeta() nor as a property $this->meta

sanchezzzhak commented 7 years ago

I agree it is necessary to rename.. done, rename special methods