k255 / drill-gis

Spatial queries with Apache Drill
Apache License 2.0
20 stars 5 forks source link

get geometry data from PostGIS #1

Open krzysztof-everbridge opened 8 years ago

krzysztof-everbridge commented 8 years ago

Great stuff.

However, I am having some problems with it: I am trying to get geometry data from Postgis using drill and later make a join with mongodb.

Here is a simple example of my query: select * from postgis.public.device_location_wkb d where ST_Within( d.current_location, ST_GeomFromText( 'POLYGON((0.0 0.0, 10.0 0.0, 10.0 10.0, 0.0 10.0, 0.0 0.0))' ) );

does not return any results, but is should. I am assuming that drill understands WKB, but it looks it does not.

My test data in PostGIS: CREATE TABLE public.device_location( device_id bigint, current_location geometry(POINT, 4326) NOT NULL, last_modified_date timestamptz NOT NULL, CONSTRAINT device_location_pk PRIMARY KEY (device_id) );

CREATE VIEW device_location_wkb AS SELECT ST_AsBinary(current_location) AS current_location FROM device_location;

INSERT INTO device_location (current_location, last_modified_date ) VALUES ( ST_Transform(ST_SetSRID(ST_Point(1.0,2.0),4326), 4326), now() ) ;

k255 commented 8 years ago

Hi, thanks for your interest in drill-gis. Actually drill-gis uses wkb as geometry representation by adapting varbinary data type. The problem is probably how drill jdbc handles data received from postgresql. Take a look at your results again with this query:

select cast(current_location as varchar(255)) from postgis.public.device_location_wbk; or even select cast(current_location as varchar(255)) from postgis.public.device_location;

as you see there are the results, but probably the data type is not handled by jdbc (the connection uses just postgres without postgis extensions).

If you work with just simple POINT type then a workaround is to use wkt or just lat lon and convert it using drill-gis back to geometry like this: pgsql: create view device_location_wkt as select ST_AsText(current_location) as current_location from device_location; drill: select ST_GEOMFROMTEXT(current_location) as geom from postgis.public.device_location_wkt;

or using lat lon:

pgsql: create view device_location_xy as select ST_X(current_location) as current_location_x, ST_Y(current_location) as current_location_y from device_location; drill: select ST_Point(current_location_x, current_location_y) as geom from postgis.public.device_location_xy;

That way you receive geometry in drill which you can then pass to other drill-gis ST_ functions.

The way you tried it would be even better but unfortunately there's no straight way to achieve that at the moment. Some time ago I forced drill-gis to use wkb/ewkb string from postgis but it was quite complex (casts and unions). I have some ongoing development to support native postgis ewkb geometries but it's just a prototype right now.

JoseSparano commented 6 years ago

Hi k255! Do you have any news about this topic? We need to get the geometry field directly from PostGIS without convert it to text. I tried modifying the JdbcRecordReader class, specifically the JDBC_TYPE_MAPPINGS map adding this:

.put(GEOMETRY_TYPE, MinorType.VARBINARY)

Where GEOMETRY_TYPE is the jdbcType we receive on that field, and indeed it shows the binary value, but when we try to execute any function using it, i.e ST_AsText(geom) it throws this error:

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: GeometryException: invalid shape type Fragment 0:0 [Error Id: a0fc44c3-caa3-47a6-898c-d71fe2848a99 on x.x.x.x:31010]

Do you have any solution for this problem? Thanks in advance.

JoseSparano commented 6 years ago

UPDATE: I finally found the way that Drill can show the geo fields, is to change the data type in PostGIS from geometry to bytea. It seems to be a compatibility issue. With this way, we can perform geospatial queries on Drill, but in PostGIS those fields are no longer geometries, so they can not be indexed and treated as such.