postgis / docker-postgis

Docker image for PostGIS
https://hub.docker.com/r/postgis/postgis/
MIT License
1.37k stars 462 forks source link

postgis/postgis:13-3.1 st_transform() problems #258

Closed samodadela closed 2 years ago

samodadela commented 2 years ago

Hi, found an issue with st_transform in postgis/postgis:13-3.1 when running test in testcontainers.

The problem is that st_transform 'looses precision' when converting WGS84 to UTM-34N. The query is as follows:

SELECT ST_Transform(ST_GeomFromText('SRID=4326;POINT(20.2598350247713 45.2828985219965)',4326),32634) as g;

In postgis/postgis:13-3.0 (previous version), the result is:

POINT(441951.47499999864 5014644.061)

while in postgis/postgis:13-3.1, the result is:

POINT(441951.4749999986 5014644.061) there is missing 4

The weird thing is that this happened only when the tests were running on a GitLab CI/CD and never when run locally on real hardware. The CI/CD docker image was build from maven:3.8.1-openjdk-11. I did not manage to run locally within docker.

For now the workaround is to use postgis/postgis:13-3.0.

Note: There was also a problem with a test that did some area calculation, but it was 'fixed' with using the old image. So st_transform many not be the only affected function.

samodadela commented 2 years ago

Update:

Since I don't have much control on those (shared) runners I can only speculate that it is something with the underlying hardware or docker implementation that causes postgis to return non-identical results.

samodadela commented 2 years ago

The question is: will Postgis return the same results independent of underlying hardware floating point math implementation or is it expected to return slightly different results?

phillipross commented 2 years ago

@samodadela what you may be seeing is that different images are built with different versions of the libraries (geos, proj, et al) which may be affecting precision in results. You can query SELECT PostGIS_Full_Version(); to see what versions of the libraries your image has.

samodadela commented 2 years ago

@phillipross hmm... which images are you referring to - postgis? The problem happens with the same postgis image postgis:13-3.1. The postgis image is used to run tests (testcontainers). So when I run the tests, postgis:13-3.1 will be started and tests will use that as the test database.

If I run the tests locally on bare metal, or locally in docker (maven:3.8.1-openjdk-11) the test return one result. If I run the same tests on a gitlab CI/CD runner the results are slightly different so the test fails because an exact match is expected. The only difference (I guess) is the 'hardware' postgis:13-3.1 is running on.

I guess that querying SELECT PostGIS_Full_Version(); will return the same value because the query is done against postgis:13-3.1. Am I missing something?

JoelPerren commented 2 years ago

@samodadela if it helps I was able to reproduce the same issue you describe.

Similar situation: our unit tests in an Azure CI/CD Pipeline began failing with different values from ST_Transform queries. Initially, they would pass locally and fail in the pipeline, but when I deleted the Docker image saved on my local machine and re-pulled it the tests would fail locally too.

From this I would guess that the postgis/postgis:13-3.1 image was modified to have some different library versions, or something similar.

In the meantime, your suggested workaround of reverting to postgis/postgis:13-3.0 seems to work.

phillipross commented 2 years ago

@phillipross hmm... which images are you referring to - postgis? The problem happens with the same postgis image postgis:13-3.1. The postgis image is used to run tests (testcontainers). So when I run the tests, postgis:13-3.1 will be started and tests will use that as the test database.

If I run the tests locally on bare metal, or locally in docker (maven:3.8.1-openjdk-11) the test return one result. If I run the same tests on a gitlab CI/CD runner the results are slightly different so the test fails because an exact match is expected. The only difference (I guess) is the 'hardware' postgis:13-3.1 is running on.

I guess that querying SELECT PostGIS_Full_Version(); will return the same value because the query is done against postgis:13-3.1. Am I missing something?

When I say different images, i'm referring to images with other tags:

I thought you had pointed out the difference between 13-3.0 and 13-3.1, which might be explained by the fact that they contain postgis versions which use differing geos and proj versions.

Additionally, it would be helpful to isolate the problem(s) by trying to rule out that this is something in the application. You're making references to maven/openjdk/testcontainers which lead me to believe you're querying from an application, but it would be helpful to see if you can duplicate the problem by querying from psql to help rule out that this has anything to do with other parts of the application.

Lastly, I'd like to point out that the images in the docker hub repo aren't exactly static. They're rebuilt on at least a weekly basis, and rely on the official postgresql dockerhub images, which in turn rely on debian/alpine dockerhub images. Potentially these upstreams can change even when the postgis version doesn't. Functional changes and regressions could theoretically make their way into images without the image tags changing. What DOES change are the digests for the images so you technically can reference the images by digest and get more stability in your build pipelines.

samodadela commented 2 years ago

@phillipross I did not explain well.

We started noticing failing tests with 13-3.1 (the problem was noticed on the 29th sept 2021). When the test was executed locally we got the correct result, but when executed on a gitlab ci/cd runner, the test returned a slightly different result (441951.4749999986 instead of 441951.47499999864; details above). Of course, when we fixed the test to work on gitlab ci/cd it was failing locally. So the same version of postgis (13-3.1), running in the same docker image is returning different results depending where it is run.

I mentioned 13-3.0 only because the problem was not happening with this version. 13-3.0 was always returning 441951.47499999864 regardless if it was executed locally or on gitlab.

I was suspecting that all being equal the difference could be in a difference of floating point (IEEE 754) implementation of the underlying hardware. I read there are slight differences between processors, though I'm no expert. If such differences are expected I'd modify the tests to compare with some tolerance. Do you know if this is so?

Regarding isolation: You are right - there are many layers involved. I did try to narrow down the problem a bit... to use psql would be quite an effort, though (testcontainers).

Thanks for pointing out the detail about the stability of the images.

phillipross commented 2 years ago

@samodadela @JoelPerren I'm going to close out this issue as I believe it's not heading toward any definitive resolution. Again, I'd like to say that my best guess would be that you were seeing changes between underlying library versions. I'll close the issue but you'll still be able to post comments in the future if you want to add more details.

If you have suspicions that there is some other variable not being accounted for, whether it's differing hardware or differing math libraries at the platform level or anything like that, the most productive thing might be to ask on the postgis-users mailing list. Users and devs monitor that list and lots of discussions regarding bugs and changes within postgis and its dependencies happen there. They'd also be able to help with devising experiments that you could do to further isolate and narrow down behavior that you might be seeing.

samodadela commented 2 years ago

@phillipross Agree, thanks.