Closed Mad-Apes closed 6 years ago
VOPS types are declared with PLAIN storage type, so Postgres is not able to move this data to TOAST storage. I can certainly change storage type to EXTENDED, but it will require all serialization routes for this type (because them now will have varying length rather the fixed length representation). But what is more important, moving data to some other (TOAST) page will significantly slow down access time and even somehow devalue idea of VOPS with vector processing.
So you really need all 64 columns in your query? If not, then my recommendation is to create several different projections with smaller subset of attributes. It can also significantly increase speed of queries involving just small subset of columns.
Yes another solution is increasing Postgres page size (form default 8kb). It will require rebuilding of Postgres from sources and most likely you will not like to do it.
And if none of the suggested alternatives are applicable in you case, I can provide you version of VOPS where VOPS types are defined with EXTENDED storage type so that you can check performance in this case.
Thanks for the reply. We have 100+ normal tables, and each table contains tens of columns. Most of the columns will be used in analytic queries.
According to your suggestion. I tried to split a large VOPS table into several small VOPS tables with smaller subset of attributes and I use VOPS FDW. I created FDW tables by the small VOPS tables which contains smaller subset of attributes. load data in VOPS tables, The error did't occurred any more. Then analyze for FDW tables. I execute query on every VOPS table,and combine each result into one by 'inner join'. It works.
There is a problem that bothers me. There are more then 500,000 rows in the table. The execution time of query with VOPS is about the same as the normal one. Maybe I did something wrong in somewhere.
My steps:
SELECT *
FROM(
SELECT
MAX(fdw_table_1.id) AS id
.
.
.
,MIN(fdw_table_1.time) AS time
,AVG(fdw_table_1.received_num) AS enb_received_power
.
.
.
,MAX(fdw_table_1.sequence_id) AS sequence_id
,count(*) AS xdr_count
,cast(null as bigint) AS align
FROM vops_fdw_table_1 fdw_table_1
GROUP BY fdw_table_1.id, fdw_table_1.align
) as fdw_1
inner join (
SELECT
MAX(fdw_table_2.sequence_id)
.
.
.
FROM vops_fdw_table_2 fdw_table_2
GROUP BY fdw_table_2.id, fdw_table_2.align
) fdw_2
on fdw_1.sequence_id = fdw_2.sequence_id
If there are wrong, please point them out. Thank you very much.I would be very grateful if you provide the version of VOPS where VOPS types are defined with EXTENDED storage type.
If you always perform grouping by sequence_id, then it is desirable to leave sequence_id as scalar type (do not replace it with VOPS type). In this case grouping will be done by Postgres is standard way while aggregates will be calculated using VOPS functions. Otherwise VOPS FDW will have to extract horizontal tuples from VOPS types, after which grouping and aggregation will be done using standard Postgres executor nodes. So there will be no performance benefit except extra overhead of unnesting VOPS tiles into horizontal tuples.
Also I do not completely understand idea of grouping by dummy align field which actually represents null value.
Please find version of the VOPS where VOPS types have extended storage class in "extended" branch in GIT.
First of all, align
is the field in table vops_fdw_table_1
and vops_fdw_table_2
. The file in this two tables is not null. During the statistical process, I need a bigint data values null, and I give it an alias named align
. I sincerely apologize to you.
OK,Next, I will do my work in these two ways and compare the performance between them.
Thanks again
Hello,
I met a new problem. data lost after performing populate function.
VOPS=# SELECT column_int2, column_int4, column_int8 FROM std_table WHERE column_int2 = 1 AND column_int4 = 0;
column_int2 | column_int4 | column_int8 -------------+-------------+------------- 1 | 0 | 762 1 | 0 | 722 1 | 0 | 497 1 | 0 | 4 1 | 0 | 472 1 | 0 | 312 1 | 0 | 504 1 | 0 | 808 (8 rows)
select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');
3.VOPS=# SELECT column_int2, column_int4, column_int8 FROM vops_table_gb WHERE column_int2 = 1 AND column_int4 = 0;
column_int2 | column_int4 | column_int8
-------------+-------------+----------------------------------------------------------------------------
1 | 0 | {504,497,762,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)
'column_int2' and 'column_int4' are scalar type in vops_table_gb .
Sorry, I can not reproduce the problem:
postgres=# create table std_table(column_int2 int2, column_int4 int4, column_int8 int8);
CREATE TABLE
postgres=# insert into std_table values (1,0,762),(1,0,722),(1,0,497),(1,0,4),(1,0,472),(1,0,312),(1,0,504),(1,0,808);
INSERT 0 8
postgres=# SELECT column_int2, column_int4, column_int8 FROM std_table WHERE column_int2 = 1 AND column_int4 = 0;
column_int2 | column_int4 | column_int8
-------------+-------------+-------------
1 | 0 | 762
1 | 0 | 722
1 | 0 | 497
1 | 0 | 4
1 | 0 | 472
1 | 0 | 312
1 | 0 | 504
1 | 0 | 808
(8 rows)
postgres=# create table vops_table_gb(column_int2 int2, column_int4 int4, column_int8 vops_int8);
CREATE TABLE
postgres=# select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');
populate
----------
8
(1 row)
postgres=# SELECT column_int2, column_int4, column_int8 FROM vops_table_gb WHERE column_int2 = 1 AND column_int4 = 0;
column_int2 | column_int4 | column_int8
-------------+-------------+---------------------------------------------------
--------------------------------------
1 | 0 | {762,722,497,4,472,312,504,808,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)
Can you send mew some example, reproducing the problem?
Hello,Here is what I did:
1:
VOPS=# create table std_table(pk_column int2,column_int2 int2, column_int4 int4, column_int8 int8); CREATE TABLE
2:
VOPS=# INSERT INTO std_table VALUES(generate_series(1,10),random()*5,random()*5,random()*100);
INSERT 0 10
3:
VOPS=# create table vops_table_gb(pk_column vops_int2,column_int2 int2, column_int4 int4, column_int8 vops_int8);
CREATE TABLE
4:
VOPS=# select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');
populate
----------
10
(1 row)
5:
VOPS=# SELECT column_int2 ,column_int4 ,sum(column_int8) FROM std_table GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4;
column_int2 | column_int4 | sum
-------------+-------------+-----
0 | 1 | 81
1 | 1 | 69
2 | 0 | 45
2 | 5 | 166
3 | 0 | 26
3 | 2 | 94
4 | 1 | 49
5 | 2 | 46
(8 rows)
6:
VOPS=# SELECT column_int2 ,column_int4 ,sum(column_int8) FROM vops_table_gb GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4;
column_int2 | column_int4 | sum
-------------+-------------+-----
0 | 1 | 81
1 | 1 | 69
2 | 0 | 45
2 | 5 | 40
3 | 0 | 26
3 | 2 | 94
4 | 1 | 49
5 | 2 | 46
(8 rows)
Look, The Result is different where column_int2=2 and column_int4=5. So, I check the data from each table.
1:
VOPS=# SELECT * FROM std_table where column_int2=2 and column_int4=5;
pk_column | column_int2 | column_int4 | column_int8
-----------+-------------+-------------+-------------
4 | 2 | 5 | 73
6 | 2 | 5 | 53
10 | 2 | 5 | 40
(3 rows)
2:
VOPS=# SELECT * FROM vops_table_gb where column_int2=2 and column_int4=5;
pk_column | column_int2 | column_int4 | column_int8
---------------------------------------------------------------------+-------------+-------------+---------------------------------------------------------------------
{10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | 2 | 5 | {40,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)
If you haven’t reproduced the problem. You can try generate more rows to insert into std_table.
Thank you very much for helping me to reproduce and investigate the problem. It is really bug in handling duplicate values in vops_populate. Fixed in commits e91c75d and 6728123.
Error on load data in VOPS table:row is too big
Hello, When I load data in VOPS table using VOPS populate(...) function and VOPS import(...) function,An error occurred :
VOPS=# select populate(destination:='vops_table'::regclass, source:='std_table'::regclass);
ERROR: row is too big: size 16392, maximum size 8160
VOPS=# select import(destination := 'vops_table'::regclass, csv_path := '/data/vops_data/std_table.csv', separator := '|');
ERROR: row is too big: size 16392, maximum size 8160 CONTEXT: IMPORT vops_table, line 65, column 63
'std_table' is a normal table which contains 64 columns.The type of all columns are int(int2、int4、int8) or float(float4、float8).
I tried to solve the problem,but the methods that I tried did not work.
So how to solve this problem ?.
best regards, thank you.