heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.93k stars 445 forks source link

class org.apache.calcite.sql.SqlUserDefinedTypeNameSpec cannot be cast to class org.apache.calcite.sql.SqlBasicTypeNameSpec (org.apache.calcite.sql.SqlUserDefinedTypeNameSpec and org.apache.calcite.sql.SqlBasicTypeNameSpec are in unnamed module of loader 'app') #736

Closed jieguolove closed 2 years ago

jieguolove commented 2 years ago

Is the gis field type supported? The table structure is as follows: `CREATE TABLE jzsh_mobile_net_info_gis ( objectid numeric(38) NOT NULL, id varchar(32), cid varchar(128), rsrp varchar(128), networktype varchar(32), tac varchar(128), pci varchar(128), longitude varchar(32), latitude varchar(32), sourcetype varchar(32), rid varchar(32), subordernumber varchar(32), physicalareaid varchar(64), loginname varchar(32), operdate timestamp, state varchar(32), types varchar(128), bsc varchar(128), btsid varchar(128), enbname varchar(254), community_id varchar(128), userlabel varchar(128), remark varchar(128), fldtag varchar(64), cover_type varchar(64), shape ST_GEOMETRY ) ; CREATE INDEX a66_ix1 ON jzsh_mobile_net_info_gis USING gist(shape); CREATE INDEX idx_mob_fldid ON jzsh_mobile_net_info_gis (id); CREATE UNIQUE INDEX r75_sde_rowid_uk ON jzsh_mobile_net_info_gis (objectid);

CREATE TABLE grid_region ( objectid numeric(38) NOT NULL, x decimal(19,8), y decimal(19,8), kpi decimal(19,8), shape ST_GEOMETRY ) ; CREATE INDEX a144_ix1 ON grid_region USING gist(shape); CREATE UNIQUE INDEX r156_sde_rowid_uk ON grid_region (objectid);`

Test creation table: `omnisky@omnisky-Super-Server:~$ $HEAVYAI_PATH/bin/heavysql -p HyperInteractive User admin connected to database heavyai heavysql> \l Database | Owner heavyai | admin information_schema | admin heavysql> \d \d incorrect number of parameters provided; need 1 total parameter(s) Usage: \d

heavysql> \t heavyai_us_states heavyai_us_counties heavyai_countries test flights_2008_7M us_states flights_2008_10k j1 heavysql> CREATE TABLE jzsh_mobile_net_info_gis ( ..> objectid numeric(38) NOT NULL, ..> id varchar(32), ..> cid varchar(128), ..> rsrp varchar(128), ..> networktype varchar(32), ..> tac varchar(128), ..> pci varchar(128), ..> longitude varchar(32), ..> latitude varchar(32), ..> sourcetype varchar(32), ..> rid varchar(32), ..> subordernumber varchar(32), ..> physicalareaid varchar(64), ..> loginname varchar(32), ..> operdate timestamp, ..> state varchar(32), ..> types varchar(128), ..> bsc varchar(128), ..> btsid varchar(128), ..> enbname varchar(254), ..> community_id varchar(128), ..> userlabel varchar(128), ..> remark varchar(128), ..> fldtag varchar(64), ..> cover_type varchar(64), ..> shape ST_GEOMETRY ..> ) ; class org.apache.calcite.sql.SqlUserDefinedTypeNameSpec cannot be cast to class org.apache.calcite.sql.SqlBasicTypeNameSpec (org.apache.calcite.sql.SqlUserDefinedTypeNameSpec and org.apache.calcite.sql.SqlBasicTypeNameSpec are in unnamed module of loader 'app') heavysql> heavysql> CREATE TABLE grid_region ( ..> objectid numeric(38) NOT NULL, ..> x decimal(19,8), ..> y decimal(19,8), ..> kpi decimal(19,8), ..> shape ST_GEOMETRY ..> ) ; class org.apache.calcite.sql.SqlUserDefinedTypeNameSpec cannot be cast to class org.apache.calcite.sql.SqlBasicTypeNameSpec (org.apache.calcite.sql.SqlUserDefinedTypeNameSpec and org.apache.calcite.sql.SqlBasicTypeNameSpec are in unnamed module of loader 'app')`

================ i have founded: https://docs.heavy.ai/sql/data-definition-ddl/datatypes-and-fixed-encoding when change some fileld type and lengh,i can create table,but can insert into geometry data. heavysql> heavysql> heavysql> CREATE TABLE jzsh_mobile_net_info_gis ( ..> objectid numeric(38) NOT NULL, ..> id varchar(32), ..> cid varchar(128), ..> rsrp varchar(128), ..> networktype varchar(32), ..> tac varchar(128), ..> pci varchar(128), ..> longitude varchar(32), ..> latitude varchar(32), ..> sourcetype varchar(32), ..> rid varchar(32), ..> subordernumber varchar(32), ..> physicalareaid varchar(64), ..> loginname varchar(32), ..> operdate timestamp, ..> state varchar(32), ..> types varchar(128), ..> bsc varchar(128), ..> btsid varchar(128), ..> enbname varchar(254), ..> community_id varchar(128), ..> userlabel varchar(128), ..> remark varchar(128), ..> fldtag varchar(64), ..> cover_type varchar(64), ..> shape GEOMETRY ..> ) ; Idle Session Timeout. User should re-authenticate. heavysql> \q Session not valid. omnisky@omnisky-Super-Server:~$ $HEAVYAI_PATH/bin/heavysql -p HyperInteractive User admin connected to database heavyai heavysql> CREATE TABLE jzsh_mobile_net_info_gis ( ..> objectid numeric(38) NOT NULL, ..> id varchar(32), ..> cid varchar(128), ..> rsrp varchar(128), ..> networktype varchar(32), ..> tac varchar(128), ..> pci varchar(128), ..> longitude varchar(32), ..> latitude varchar(32), ..> sourcetype varchar(32), ..> rid varchar(32), ..> subordernumber varchar(32), ..> physicalareaid varchar(64), ..> loginname varchar(32), ..> operdate timestamp, ..> state varchar(32), ..> types varchar(128), ..> bsc varchar(128), ..> btsid varchar(128), ..> enbname varchar(254), ..> community_id varchar(128), ..> userlabel varchar(128), ..> remark varchar(128), ..> fldtag varchar(64), ..> cover_type varchar(64), ..> shape GEOMETRY ..> ) ; DECIMAL and NUMERIC precision cannot be larger than 19. heavysql> CREATE TABLE jzsh_mobile_net_info_gis ( ..> objectid numeric(19) NOT NULL, ..> id varchar(32), ..> cid varchar(128), ..> rsrp varchar(128), ..> networktype varchar(32), ..> tac varchar(128), ..> pci varchar(128), ..> longitude varchar(32), ..> latitude varchar(32), ..> sourcetype varchar(32), ..> rid varchar(32), ..> subordernumber varchar(32), ..> physicalareaid varchar(64), ..> loginname varchar(32), ..> operdate timestamp, ..> state varchar(32), ..> types varchar(128), ..> bsc varchar(128), ..> btsid varchar(128), ..> enbname varchar(254), ..> community_id varchar(128), ..> userlabel varchar(128), ..> remark varchar(128), ..> fldtag varchar(64), ..> cover_type varchar(64), ..> shape GEOMETRY ..> ) ; objectid: Precision too high, max 18. heavysql> heavysql> CREATE TABLE jzsh_mobile_net_info_gis ( ..> objectid numeric(18) NOT NULL, ..> id varchar(32), ..> cid varchar(128), ..> rsrp varchar(128), ..> networktype varchar(32), ..> tac varchar(128), ..> pci varchar(128), ..> longitude varchar(32), ..> latitude varchar(32), ..> sourcetype varchar(32), ..> rid varchar(32), ..> subordernumber varchar(32), ..> physicalareaid varchar(64), ..> loginname varchar(32), ..> operdate timestamp, ..> state varchar(32), ..> types varchar(128), ..> bsc varchar(128), ..> btsid varchar(128), ..> enbname varchar(254), ..> community_id varchar(128), ..> userlabel varchar(128), ..> remark varchar(128), ..> fldtag varchar(64), ..> cover_type varchar(64), ..> shape GEOMETRY ..> ) ; heavysql> \t heavyai_us_states heavyai_us_counties heavyai_countries test flights_2008_7M us_states flights_2008_10k j1 jzsh_mobile_net_info_gis heavysql> CREATE INDEX a66_ix1 ON jzsh_mobile_net_info_gis USING gist(shape);CREATE INDEX a66_ix1 ON jzsh_mobile_net_info_gis USING gist(shape); SQL Error: Encountered "INDEX" at line 1, column 8. Was expecting one of: "OR" ... "DATAFRAME" ... "DATABASE" ... "SERVER" ... "FOREIGN" ... "USER" ... "TEMPORARY" ... "TABLE" ... "VIEW" ... "ROLE" ... "POLICY" ...

heavysql> CREATE TABLE grid_region ( ..> objectid numeric(18) NOT NULL, ..> x decimal(19,8), ..> y decimal(19,8), ..> kpi decimal(19,8), ..> shape GEOMETRY ..> ) ; x: Precision too high, max 18. heavysql> CREATE TABLE grid_region ( ..> objectid numeric(18) NOT NULL, ..> x decimal(18,8), ..> y decimal(18,8), ..> kpi decimal(18,8), ..> shape GEOMETRY ..> ) ; heavysql> \t heavyai_us_states heavyai_us_counties heavyai_countries test flights_2008_7M us_states flights_2008_10k j1 jzsh_mobile_net_info_gis grid_region

heavysql> copy GRID_REGION from '/heavyai-data/GRID_REGION.sql'; Thrift error: No more data to read. Thrift connection error: No more data to read. Retrying connection Thrift error: No more data to read. Thrift connection error: No more data to read. Retrying connection Thrift: Thu Apr 28 03:29:56 2022 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection Thrift: Thu Apr 28 03:30:04 2022 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection

^C Interrupt signal (2) received. Asking server to interrupt query. Cannot connect to HeavyDB Server. heavysql> ^C Thrift: Thu Apr 28 03:30:21 2022 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection omnisky@omnisky-Super-Server:~$ head /heavyai-data/GRID_REGION.sql 91 116.02832 40.275016 -85 POLYGON (( 116.02810300 40.27515200, 116.02810300 40.27470300, 116.02868900 40.27470300, 116.02868900 40.27515200, 116.02810300 40.27515200)) 92 116.029492 40.275016 -96 POLYGON (( 116.02927500 40.27515200, 116.02927500 40.27470300, 116.02986200 40.27470300, 116.02986200 40.27515200, 116.02927500 40.27515200)) omnisky@omnisky-Super-Server:~$

image

cdessanti commented 2 years ago

Hi @jieguolove,

the datatypes supported in HeavyDB are listed here

https://docs.heavy.ai/sql/data-definition-ddl/datatypes-and-fixed-encoding

we try to do an on-fly conversion for datatype like varchar

lowering the precision of decimals and numerics to 18, specifying geometry datatypes (I choose a point for the jzsh table and multipolygon for the grid_region), and removing the indexes (we don't use any explicit index in the database) fixes the errors


drop table jzsh_mobile_net_info_gis;
CREATE TABLE jzsh_mobile_net_info_gis (
objectid numeric(18) NOT NULL,
id varchar(32),
cid varchar(128),
rsrp varchar(128),
networktype varchar(32),
tac varchar(128),
pci varchar(128),
longitude varchar(32),
latitude varchar(32),
sourcetype varchar(32),
rid varchar(32),
subordernumber varchar(32),
physicalareaid varchar(64),
loginname varchar(32),
operdate timestamp,
state varchar(32),
types varchar(128),
bsc varchar(128),
btsid varchar(128),
enbname varchar(254),
community_id varchar(128),
userlabel varchar(128),
remark varchar(128),
fldtag varchar(64),
cover_type varchar(64),
shape GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
) ;

drop table grid_region;
CREATE TABLE grid_region (
objectid numeric(18) NOT NULL,
x decimal(18,8),
y decimal(18,8),
kpi decimal(18,8),
shape GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)
);

Checking the generated DLL after the creation with \d command, you will find that all the varchar fields are TEXT ENCODING DICT(32)

heavysql> \d jzsh_mobile_net_info_gis
CREATE TABLE jzsh_mobile_net_info_gis (
  objectid DECIMAL(18,0) NOT NULL,
  id TEXT ENCODING DICT(32),
  cid TEXT ENCODING DICT(32),
  rsrp TEXT ENCODING DICT(32),
  networktype TEXT ENCODING DICT(32),
...
shape GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
);

Having encoded dictionary values means that the strings are stored in a dictionary, and they will need just 4 bytes in GPU memory. The max string size is 32767. You can further optimize that by changing the number of bits used for the encoding; If you know in advance that there are just 15 network types, a dictionary of 8 bits will be enough. After loading the table, you can use the \o table_name command of heavysql to get an optimized DDL of the table.

We have some geo joins that use an accelerated framework, so to find which region originated the call running this query (loaded the table with some data (500m records) from NewYork

with just 1 GPU

heavysql> select count(*) from jzsh_mobile_net_info_gis as points join grid_region as poly on st_contains(poly.shape,points.shape);
EXPR$0
472471342
1 rows returned.
Execution time: 5428 ms, Total time: 5429 ms

regards, Candido

jieguolove commented 2 years ago

Hi @jieguolove,

the datatypes supported in HeavyDB are listed here

https://docs.heavy.ai/sql/data-definition-ddl/datatypes-and-fixed-encoding

we try to do an on-fly conversion for datatype like varchar

lowering the precision of decimals and numerics to 18, specifying geometry datatypes (I choose a point for the jzsh table and multipolygon for the grid_region), and removing the indexes (we don't use any explicit index in the database) fixes the errors


drop table jzsh_mobile_net_info_gis;
CREATE TABLE jzsh_mobile_net_info_gis (
objectid numeric(18) NOT NULL,
id varchar(32),
cid varchar(128),
rsrp varchar(128),
networktype varchar(32),
tac varchar(128),
pci varchar(128),
longitude varchar(32),
latitude varchar(32),
sourcetype varchar(32),
rid varchar(32),
subordernumber varchar(32),
physicalareaid varchar(64),
loginname varchar(32),
operdate timestamp,
state varchar(32),
types varchar(128),
bsc varchar(128),
btsid varchar(128),
enbname varchar(254),
community_id varchar(128),
userlabel varchar(128),
remark varchar(128),
fldtag varchar(64),
cover_type varchar(64),
shape GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
) ;

drop table grid_region;
CREATE TABLE grid_region (
objectid numeric(18) NOT NULL,
x decimal(18,8),
y decimal(18,8),
kpi decimal(18,8),
shape GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)
);

Checking the generated DLL after the creation with \d command, you will find that all the varchar fields are TEXT ENCODING DICT(32)

heavysql> \d jzsh_mobile_net_info_gis
CREATE TABLE jzsh_mobile_net_info_gis (
  objectid DECIMAL(18,0) NOT NULL,
  id TEXT ENCODING DICT(32),
  cid TEXT ENCODING DICT(32),
  rsrp TEXT ENCODING DICT(32),
  networktype TEXT ENCODING DICT(32),
...
shape GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
);

Having encoded dictionary values means that the strings are stored in a dictionary, and they will need just 4 bytes in GPU memory. The max string size is 32767. You can further optimize that by changing the number of bits used for the encoding; If you know in advance that there are just 15 network types, a dictionary of 8 bits will be enough. After loading the table, you can use the \o table_name command of heavysql to get an optimized DDL of the table.

We have some geo joins that use an accelerated framework, so to find which region originated the call running this query (loaded the table with some data (500m records) from NewYork

with just 1 GPU

heavysql> select count(*) from jzsh_mobile_net_info_gis as points join grid_region as poly on st_contains(poly.shape,points.shape);
EXPR$0
472471342
1 rows returned.
Execution time: 5428 ms, Total time: 5429 ms

regards, Candido

thanks a lot!!!

cdessanti commented 2 years ago

Hi,

you can load points, polygons, multipolygons,linestrings, and so on in the database, but you need to specify the GEOMETRY type in advance

CREATE TABLE grid_region (
objectid numeric(18) NOT NULL,
x decimal(18,8),
y decimal(18,8),
kpi decimal(18,8),
shape GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)
);
heavysql> copy grid_region from '/mapd_storage/data_heavy/import/GRID_REGION.sql' with (delimiter='|', header='false');
Result
Loaded: 2 recs, Rejected: 0 recs in 0.089000 secs
heavysql> select * from grid_region;
objectid|x|y|kpi|shape
91|116.02832000|40.27501600|-85.00000000|POLYGON ((116.028102941824 40.2751519625425,116.028102941824 40.2747029858989,116.028688920675 40.2747029858989,116.028688920675 40.2751519625425,116.028102941824 40.2751519625425))
92|116.02949200|40.27501600|-96.00000000|POLYGON ((116.029274983345 40.2751519625425,116.029274983345 40.2747029858989,116.029861968025 40.2747029858989,116.029861968025 40.2751519625425,116.029274983345 40.2751519625425))

I changed the delimiter of your file from space to pipe (|), so I changed the delimiter accordingly.

more /mapd_storage/data_heavy/import/GRID_REGION.sql
91|116.02832|40.275016|-85|POLYGON (( 116.02810300 40.27515200, 116.02810300 40.27470300, 116.02868900 40.27470300, 116.02868900 40.27515200, 116.02810300 40.27515200))
92|116.029492|40.275016|-96|POLYGON (( 116.02927500 40.27515200, 116.02927500 40.27470300, 116.02986200 40.27470300, 116.02986200 40.27515200, 116.02927500 40.27515200))

We don't support arrays of geometries, but they aren't the same thing as simple geometries.

Candido

jieguolove commented 2 years ago

Hi,

you can load points, polygons, multipolygons,linestrings, and so on in the database, but you need to specify the GEOMETRY type in advance

CREATE TABLE grid_region (
objectid numeric(18) NOT NULL,
x decimal(18,8),
y decimal(18,8),
kpi decimal(18,8),
shape GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)
);
heavysql> copy grid_region from '/mapd_storage/data_heavy/import/GRID_REGION.sql' with (delimiter='|', header='false');
Result
Loaded: 2 recs, Rejected: 0 recs in 0.089000 secs
heavysql> select * from grid_region;
objectid|x|y|kpi|shape
91|116.02832000|40.27501600|-85.00000000|POLYGON ((116.028102941824 40.2751519625425,116.028102941824 40.2747029858989,116.028688920675 40.2747029858989,116.028688920675 40.2751519625425,116.028102941824 40.2751519625425))
92|116.02949200|40.27501600|-96.00000000|POLYGON ((116.029274983345 40.2751519625425,116.029274983345 40.2747029858989,116.029861968025 40.2747029858989,116.029861968025 40.2751519625425,116.029274983345 40.2751519625425))

I changed the delimiter of your file from space to pipe (|), so I changed the delimiter accordingly.

more /mapd_storage/data_heavy/import/GRID_REGION.sql
91|116.02832|40.275016|-85|POLYGON (( 116.02810300 40.27515200, 116.02810300 40.27470300, 116.02868900 40.27470300, 116.02868900 40.27515200, 116.02810300 40.27515200))
92|116.029492|40.275016|-96|POLYGON (( 116.02927500 40.27515200, 116.02927500 40.27470300, 116.02986200 40.27470300, 116.02986200 40.27515200, 116.02927500 40.27515200))

We don't support arrays of geometries, but they aren't the same thing as simple geometries.

Candido

Nice job! thank thank thank!

jieguolove commented 2 years ago

my data no '|‘,look like :

92 116.029492 40.275016 -96 POLYGON (( 116.02927500 40.27515200, 116.02927500 40.27470300, 116.02986200 40.27470300, 116.02986200 40.27515200, 116.02927500 40.27515200)) 93 116.032423 40.275016 -93 POLYGON (( 116.03220700 40.27515200, 116.03220700 40.27470300, 116.03279300 40.27470300, 116.03279300 40.27515200, 116.03220700 40.27515200)) 94 116.044732 40.275016 -78 POLYGON (( 116.04451900 40.27515200, 116.04451900 40.27470300, 116.04510500 40.27470300, 116.04510500 40.27515200, 116.04451900 40.27515200)) 95 116.045904 40.275016 -111 POLYGON (( 116.04569100 40.27515200, 116.04569100 40.27470300, 116.04627800 40.27470300, 116.04627800 40.27515200, 116.04569100 40.27515200))

What is the delimiter of tab? thanks

cdessanti commented 2 years ago

It's a pipe delimiter, but you can change it with everything you want.

Your data looks to be delimited by spaces, making it impossible to import.

are you exporting it from PostgreSQL with a copy command?

jieguolove commented 2 years ago

It's a pipe delimiter, but you can change it with everything you want.

Your data looks to be delimited by spaces, making it impossible to import.

are you exporting it from PostgreSQL with a copy command?

use ora2pg from Oracle

jieguolove commented 2 years ago

It's a pipe delimiter, but you can change it with everything you want.

Your data looks to be delimited by spaces, making it impossible to import.

delimited by tab

cdessanti commented 2 years ago

if it's delimited by tab just change the delimiter value with '\t' copy grid_region from '/mapd_storage/data_heavy/import/GRID_REGION.sql' with (delimiter='\t', header='false');

jieguolove commented 2 years ago

if it's delimited by space just the delimiter value with 'what'??? thanks! Do you have a delimiter reference url?

if it's delimited by tab just change the delimiter value with '\t' copy grid_region from '/mapd_storage/data_heavy/import/GRID_REGION.sql' with (delimiter='\t', header='false');

cdessanti commented 2 years ago

Yes the reference for the copy command of heavydb is here

https://docs.heavy.ai/loading-and-exporting-data/command-line/load-data#copy-fromcopy-from

Anyway, it's not viable to import such data with space as a delimiter. It should be exported with a different delimiter or with strings datatypes enclosed by double quotes

jieguolove commented 2 years ago

Yes the reference for the copy command of heavydb is here

https://docs.heavy.ai/loading-and-exporting-data/command-line/load-data#copy-fromcopy-from

Anyway, it's not viable to import such data with space as a delimiter. It should be exported with a different delimiter or with strings datatypes enclosed by double quotes

ok