toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.11k stars 174 forks source link

Mapping lat and lon from postgis geometry field (EPSG:4326) to elasticsearch/kibana POINT(lat,lon) instead of POINT(lon,lat) #388

Open keniobats opened 1 year ago

keniobats commented 1 year ago

Hello! First of all I want to thank you @toluaina for you awesome project! Please keep it that way!!!

PGSync version: 2.3.3

Postgres version: 13-master(postgis)

Elasticsearch version: 8.4

Redis version: 7.0.5

Python version: 3.7.15

Problem Description: I have a field called "ubicacion" with geometry data:

bigeye=# select * from public."Intervenciones";
 id | sort |             user_created             |        date_created        |             user_updated             |       date_updated        |                     ubicacion                      
----+------+--------------------------------------+----------------------------+--------------------------------------+---------------------------+----------------------------------------------------
  1 |      | 976f9890-dcf4-4cda-a905-e3765bf0da79 | 2022-11-26 01:57:12.699+00 | 976f9890-dcf4-4cda-a905-e3765bf0da79 | 2022-11-26 01:58:55.78+00 | 0101000020E6100000E40C05780E4F50C0FC578F35D2D23AC0

Which, in turn, decodes like this:

bigeye=# select to_json(ubicacion::json) from public."Intervenciones";
                                                            to_json                                                            
-------------------------------------------------------------------------------------------------------------------------------
 {"type":"Point","crs":{"type":"name","properties":{"name":"EPSG:4326"}},"coordinates":[-65.2352581071205,-26.8235200381241]}
 {"type":"Point","crs":{"type":"name","properties":{"name":"EPSG:4326"}},"coordinates":[-63.3673516375259,-28.92800146429721]}
(2 rows)

{
   "type":"Point",
   "crs":{
      "type":"name",
      "properties":{
         "name":"EPSG:4326"
      }
   },
   "coordinates":[
      -65.2352581071205,
      -26.8235200381241
   ]
}

bigeye=# select to_json(ubicacion::json #> '{coordinates,1}') from public."Intervenciones";
      to_json       
--------------------
 -26.8235200381241
 -28.92800146429721
(2 rows)

Kibana recognizes "ubicacion" field and can be mapped on a map, but it maps it incorrectly because it uses POINT(lon, lat) instead of POINT(lat, lon):

Screenshot from 2022-11-30 09-18-59

Screenshot from 2022-11-30 09-09-20

Kibana removes the POINT() if I try inverting lat&lon and, eventually, does not plot the point on a map:

Screenshot from 2022-11-30 09-16-28

Now, trying to define my schema.json and testing combinations of transforms(all of them, some of them, etc) like this:

[   
    {
        "database": "bigeye",
        "index": "bigeye_intervenciones",
        "nodes": {
            "table": "Intervenciones",
            "columns":[
                "ubicacion#>{coordinates,1}",
                "ubicacion#>{coordinates,1}"
            ],
            "transform": {
                "rename": {
                    "ubicacion#>{coordinates,0}": "lat",
                    "ubicacion#>{coordinates,1}": "lon"
                },
                "concat": {
                    "columns": ["ubicacion#>{coordinates,0}", "ubicacion#>{coordinates,1}"],
                    "destination": "lugar",
                    "delimiter": ","
                }
            }

        }
    }
]

I got errors like this:

wait-for-it.sh: waiting 60 seconds for database-dev:5432
wait-for-it.sh: database-dev:5432 is available after 0 seconds
wait-for-it.sh: waiting 60 seconds for es01-dev:9200
wait-for-it.sh: es01-dev:9200 is available after 17 seconds
wait-for-it.sh: waiting 60 seconds for redis-dev:6379
wait-for-it.sh: redis-dev:6379 is available after 0 seconds
/usr/local/lib/python3.7/site-packages/pgsync/base.py:172: SAWarning: Did not recognize type 'geometry' of column 'ubicacion'
  metadata.reflect(self.engine, views=True)
2022-11-30 11:57:00.908:WARNING:pgsync.sync: Checkpoint file not found: ./.bigeye_bigeye_intervenciones
 - public.Intervenciones
 - public.Intervenciones

/usr/local/lib/python3.7/site-packages/pgsync/base.py:172: SAWarning: Did not recognize type 'geometry' of column 'ubicacion'
  metadata.reflect(self.engine, views=True)
2022-11-30 11:57:03.065:ERROR:pgsync.elastichelper: Exception (psycopg2.errors.UndefinedFunction) operator does not exist: geometry #> unknown
LINE 2: ...ion_coordinates_0', ("Intervenciones_1".ubicacion #> '{coord...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT anon_1."JSON_BUILD_ARRAY_1", anon_1."JSON_BUILD_OBJECT_1", anon_1.id 
FROM (SELECT JSON_BUILD_ARRAY() AS "JSON_BUILD_ARRAY_1", CAST(JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_2)s, ("Intervenciones_1".ubicacion #> %(ubicacion_1)s) -> %(param_1)s, %(JSON_BUILD_OBJECT_3)s, ("Intervenciones_1".ubicacion #> %(ubicacion_2)s) -> %(param_2)s) AS JSONB) AS "JSON_BUILD_OBJECT_1", "Intervenciones_1".id AS id 
FROM public."Intervenciones" AS "Intervenciones_1" 
WHERE CAST(CAST("Intervenciones_1".xmin AS TEXT) AS BIGINT) < %(param_3)s) AS anon_1]
[parameters: {'JSON_BUILD_OBJECT_2': 'ubicacion_coordinates_0', 'ubicacion_1': '{coordinates}', 'param_1': 0, 'JSON_BUILD_OBJECT_3': 'ubicacion_coordinates_1', 'ubicacion_2': '{coordinates}', 'param_2': 1, 'param_3': 8996}]
(Background on this error at: https://sqlalche.me/e/14/f405)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedFunction: operator does not exist: geometry #> unknown
LINE 2: ...ion_coordinates_0', ("Intervenciones_1".ubicacion #> '{coord...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

What I need is to create a field with the lat and lon values to let kibana handle maps markers automatically. Any help will be appreciated!

toluaina commented 1 year ago

I think the issue here is that that ubicacion Geo field is not a native Postgres field and you need to cast it to a JSON type in order to extract the data.

Implementing out of the box support for Geofields would be a lot of effort and I'm not sure its the right way. I think a cleaner design would be to delegate this to plugins even in the long term. I am thinking of creating a set of builtin plugins to address these use cases.

For now, you can use a plugin similar to this one to extract the fields

In your case, you want to modify the plugin to extract the coordinates

So you can just change that code to

doc['point'] = doc['ubicacion']['coordinates']

where the point field contains your coordinates