jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 709 forks source link

Scientific notation on numbers close to zero #897

Open mgvarley opened 8 years ago

mgvarley commented 8 years ago

Hi - I am using ES2.3.3 and the corresponding latest version of elasticsearch-jdbc. I am trying to import rows from a Postgres 9.5 DB where there are numbers stored as float8. Some of the numbers that are close to zero with several decimal places are stored coming across into Elasticsearch in scientific notation but with the decimal places truncated.

Example: 0.0000283 (2.83e-05) in Postgres is importing as 2.83 in Elasticsearch

As these values are being used for geographic coordinates the impact is significant and in some cases the locations are appearing in the wrong country. I have searched the forums and cannot find any known issue in the Postgres JDBC driver.

Any help on this would be much appreciated. Happy to submit a PR if you could point me in the direction of where to look?

mgvarley commented 8 years ago

The solution I have come up with is to add explicit roundings in the query and a scale to the mapping however I still think the default behaviour of treating Postgres floats in scientific notation as numbers is dangerous. I would be happy to add a section to the README for Postgres / PostGIS users if that is useful?

The mapping I am using is as follows:

{
    "type": "jdbc",
    "jdbc": {
...
        "sql": "select round(ST_X(the_geom)::numeric,8) as \"location.lon\", round(ST_Y(the_geom)::numeric,8) as \"location.lat\" from my_table",
        "index": "my_index",
        "type": "my_type",
        "scale": 8
    }
}
jprante commented 8 years ago

FLOAT in SQL is a nightmare since the value is approximate. Using DECIMAL with defined precision should work better.

mgvarley commented 8 years ago

Thanks Jörg - would be good to have a Postgres / PostGIS example in the README, it is a very common use case I imagine. I have been using this in Production for over a year following the default "SELECT ST_X(the_geom) as \"location.lon\", ST_Y(the_geom) as \"location.lat\" from my_table" and only just found this issue. Default return type of ST_X / ST_Y is float so explicit casts are needed for this to behave as expected. If I can help others avoid this that would be great.