postgrespro / vops

Other
166 stars 22 forks source link

Error on 'count(*)' #12

Closed Mad-Apes closed 12 months ago

Mad-Apes commented 6 years ago

Hello, Another error was found when I execute 'count()', Please check. 1: VOPS_TEST=# create table std_table(pk_column int2,column_int2 int2, column_int4 int4, column_int8 int8); CREATE TABLE 2: `INSERT INTO std_table VALUES(random()100,2,5,random()1000); INSERT INTO std_table VALUES(random()100,2,5,random()1000); INSERT INTO std_table VALUES(random()100,7,3,random()*1000);`

3: VOPS_TEST=# table std_table; pk_column | column_int2 | column_int4 | column_int8 -----------+-------------+-------------+------------- 68 | 2 | 5 | 339 92 | 2 | 5 | 607 90 | 7 | 3 | 372 (3 rows)

4: VOPS_TEST=# create table vops_table_gb(pk_column vops_int2,column_int2 int2, column_int4 int4, column_int8 vops_int8); CREATE TABLE

5: VOPS_TEST=# select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4'); populate ---------- 3 (1 row)

6: VOPS_TEST=# select column_int2,column_int4,column_int8 from vops_table_gb; column_int2 | column_int4 | column_int8 -------------+-------------+------------------------------------------------------------------------- 2 | 5 | {339,607,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} 7 | 3 | {372,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} (2 rows)

7: VOPS_TEST=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM std_table GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 2 | 5 | 946 | 2 7 | 3 | 372 | 1 (2 rows)

8:

VOPS_TEST=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 2 | 5 | 946 | 64 7 | 3 | 372 | 64 (2 rows)

knizhnik commented 6 years ago

Sorry, if you are not using FDW, then you should use countall(*) function instead of count().

Mad-Apes commented 6 years ago

countall(*) is also wrong.

VOPS_TEST=# table std_table; pk_column | column_int2 | column_int4 | column_int8 -----------+-------------+-------------+------------- 68 | 2 | 5 | 723 15 | 2 | 5 | 474 91 | 7 | 3 | 697 (3 rows)

VOPS_TEST=# SELECT column_int2,column_int4,column_int8 FROM vops_table_gb; column_int2 | column_int4 | column_int8 -------------+-------------+------------------------------------------------------------------------- 2 | 5 | {723,474,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} 7 | 3 | {697,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} (2 rows)

VOPS_TEST=# SELECT column_int2,column_int4,sum(column_int8),count(*),countall(*) FROM vops_table_gb GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count | countall -------------+-------------+------+-------+---------- 2 | 5 | 1197 | 64 | 64 7 | 3 | 697 | 64 | 64 (2 rows)

knizhnik commented 6 years ago

Sorry, for confusing you. First of all you do not need to use countall() instead of count() - VOPS executor hook substitute it implicitly. But to get correct result of aggregation with group by you need to use count(column_int8) - otherwise VOPS is not able to correctly process tiles. I will think how it is possible to address this issue.

Mad-Apes commented 6 years ago

Ok, I am a little confused.

VOPS_TEST=# table std_table; pk_column | column_int2 | column_int4 | column_int8 -----------+-------------+-------------+------------- 12 | 2 | 5 | 470 83 | 2 | 5 | 665 68 | 7 | 3 | 928 67 | 2 | 5 | 30 | 7 | 3 | (5 rows)

attention please. The value of column_int8 is NULL where pk_column=67 and pk_column=30

After populating. VOPS_TEST=# select column_int2, column_int4, column_int8 from vops_table_gb; column_int2 | column_int4 | column_int8 -------------+-------------+-------------------------------------------------------------------------- 2 | 5 | {470,665,?,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} 7 | 3 | {928,?,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} (2 rows)

The result that I want is this: VOPS_TEST=# SELECT column_int2,column_int4,count(*) FROM std_table GROUP BY column_int2,column_int4; column_int2 | column_int4 | count -------------+-------------+------- 7 | 3 | 2 2 | 5 | 3 (2 rows)

But when I execute de query on VOPS table, I get this :

VOPS_TEST=# SELECT column_int2,column_int4,count(*) FROM vops_table_gb GROUP BY column_int2,column_int4; column_int2 | column_int4 | count -------------+-------------+------- 7 | 3 | 1 2 | 5 | 1 (2 rows)

*Look, The count is 1. But if I add other aggregate function. The result of 'count()' is different:**

VOPS_TEST=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb GROUP BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+------+------- 7 | 3 | 928 | 64 2 | 5 | 1135 | 64 (2 rows)

Look The count changed to 64. I don't know why this happens

According to your suggestion. I use count(column_int8).

VOPS_TEST=# SELECT column_int2,column_int4,count(*) as "count(*)",count(column_int2) as "count(column_int2)",count(column_int8) as "count(column_int8)" ,count(pk_column) as "count(pk_column)" FROM vops_table_gb GROUP BY column_int2,column_int4; column_int2 | column_int4 | count(*) | count(column_int2) | count(column_int8) | count(pk_column) -------------+-------------+----------+--------------------+--------------------+------------------ 7 | 3 | 64 | 1 | 1 | 2 2 | 5 | 64 | 1 | 2 | 3 (2 rows)

Obviously, count(*) and count(column_int2) are wrong. count(column_int8) is only total number of Non-null values. count(pk_column) is correct.

If I query the total row number of a group, no matter whether a column is null or not. I can ues count(*) or count(column_int2) in std_table which is a normal table. But in the VOPS table I shoud count a column which does not contains null value?

Actually, we probably have no idea which column does not contains null value. So In this situation, what should we do to count all?

Best regards, thank you.

knizhnik commented 6 years ago

Yes, you should use count(pk_column) or any other non-null column. But I have added countany(enyelement) aggregate and committed the fix which implicitly replaces count(*) with countany(arg).

So if you now get updated VOPS version from repository, then your query will work correctly without any hacks:

SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb WHERE column_int8>0 GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4;
 column_int2 | column_int4 | sum | count 
-------------+-------------+-----+-------
           0 |           1 |  52 |     1
           1 |           0 |  71 |     1
           1 |           1 |  92 |     1
           2 |           1 |   7 |     1
           2 |           2 |  50 |     1
           3 |           2 |  33 |     1
           4 |           4 | 118 |     2
           5 |           3 | 100 |     2
(8 rows)
Mad-Apes commented 6 years ago

I reinstalled the Linux operating system,reinstalled postgresql and re-download VOPS to make sure everything is ok.

vops=# table std_table; pk_column | column_int2 | column_int4 | column_int8 -----------+-------------+-------------+------------- 62 | 2 | 5 | 141 12 | 2 | 5 | 78 32 | 7 | 3 | 966 10 | 2 | 5 | 68 | 7 | 3 | (5 rows)

I execute the sql that you provid.

vops=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb WHERE column_int8>0 GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 2 | 5 | 219 | 2 7 | 3 | 966 | 1 (2 rows)

vops=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb WHERE pk_column is not null GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 2 | 5 | 219 | 3 7 | 3 | 966 | 2 (2 rows)

My understanding is that using count(*) must specify where conditions . Is this right?

If this understanding is right, then what is the meaning of count(*) in a unconditional query?

But I Think count(*) in a unconditional query should be used to count all no matter whether a column is null or not.

knizhnik commented 6 years ago

Did you call vops_initialize() before running this queries? If not, please repeat your test after executing "select vops_intialize();"

Mad-Apes commented 6 years ago

Yes, I did call vops_initialize() before every query.

1: vops=# SELECT vops_initialize(); vops_initialize -----------------

(1 row)

vops=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb WHERE column_int8>0 GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 2 | 5 | 219 | 2 7 | 3 | 966 | 1 (2 rows)

2: vops=# SELECT vops_initialize(); vops_initialize -----------------

(1 row)

vops=# SELECT column_int2,column_int4,sum(column_int8),count(*) FROM vops_table_gb GROUP BY column_int2,column_int4; column_int2 | column_int4 | sum | count -------------+-------------+-----+------- 7 | 3 | 966 | 64 2 | 5 | 219 | 64 (2 rows)

3: vops=# SELECT vops_initialize(); vops_initialize -----------------

(1 row)

vops=# SELECT column_int2,column_int4,count(*) FROM vops_table_gb GROUP BY column_int2,column_int4; column_int2 | column_int4 | count -------------+-------------+------- 7 | 3 | 1 2 | 5 | 1 (2 rows)

Please look at the the second and third query. The result of count(*) is different. Why?

Best regards, thank you.

knizhnik commented 6 years ago

It is necessary to call vops_initialize() only once when you start new backend. The problem is that extension is not loaded by Postgres until you call any extension function. VOPS is registering executor hook for query transformation (for example for replacing count(*) with countany(column_int8). Hoks are registered on extension loading. But it may be too late, because executor already passed the state at which this hook is invoked. This is why it is necessary to call dummy vops_intialize() function which actually does nothing before performing any query on VOPS table.

Based on your results mentioned above, it seems to me that for some reasons you have not upgraded VOPS extension. Please notice that you need to call "make install" in VOPS directory and check that library is installed in proper place. You should also restart backend, drop and recreate VOPS extension. To check that you are really using the latest version of VOPS extension, please try to execute query:

SELECT column_int2,column_int4,sum(column_int8),countany(column_int8) FROM vops_table_gb GROUP BY column_int2,column_int4;

countany is new function I have added. It should return correct count number doesn't matter whether there are NULL values in specified column.

Unfortunately third query will not be correctly transformed - there are no any aggregates on VOPS columns, so VOPS has no idea that count(*) should be replaced with countany in this case. You can to call contany(column_int8) explicitly here,.