pyproj4 / pyproj

Python interface to PROJ (cartographic projections and coordinate transformations library)
https://pyproj4.github.io/pyproj
MIT License
1.05k stars 212 forks source link

Help with PROJ 9.3 CRS comparison issue #1339

Closed djhoese closed 1 year ago

djhoese commented 1 year ago

Code Sample, a copy-pastable example if possible

See full context here: https://github.com/pytroll/pyresample/issues/539

In [1]: from pyproj import CRS

In [2]: crs1 = CRS.from_proj4("+proj=laea +lat_0=90 +lon_0=0 +a=6371228.0 +units=m")

In [3]: crs2 = CRS.from_epsg(3408)

In [4]: crs1.to_proj4()
/usr/lib/python3/dist-packages/pyproj/crs/crs.py:1293: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
  proj = self._crs.to_proj4(version=version)
Out[4]: '+proj=laea +lat_0=90 +lon_0=0 +x_0=0 +y_0=0 +R=6371228 +units=m +no_defs +type=crs'

In [5]: crs2.to_proj4()
/usr/lib/python3/dist-packages/pyproj/crs/crs.py:1293: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
  proj = self._crs.to_proj4(version=version)
Out[5]: '+proj=laea +lat_0=90 +lon_0=0 +x_0=0 +y_0=0 +R=6371228 +units=m +no_defs +type=crs'

In [6]: crs1.to_epsg()
Out[6]: 3408

In [7]: crs2.to_epsg()
Out[7]: 3408

In [8]: crs1 == crs2
Out[8]: False

Problem description

The above code with PROJ 9.2.x does not produce an EPSG code for crs1. In PROJ 9.3, it does. The equality is False in both cases...but I'm not sure it should be in the 9.3 case if PROJ 9.3 is able to think of them as equivalent EPSG codes.

Expected Output

Either no EPSG code for crs1 (the from PROJ.4 string) or equality between the two.

NOTE: This has only been tested on PROJ 9.3 by @avalentino on their test system. I do not have 9.3 running locally (yet).

@snowman2 Can you help me file a bug with PROJ with a C example? Or do you not consider this a bug?

Environment Information

Installation method

Conda environment information (if you installed with conda):


Environment (conda list):

``` $ conda list proj ```


Details about conda and system ( conda info ):

``` $ conda info ```
avalentino commented 1 year ago

@sebastic FYI

sebastic commented 1 year ago

@djhoese, I think you should bring this up on the PROJ list:

https://lists.osgeo.org/mailman/listinfo/proj

Specially the thread about PROJ 9.3:

https://lists.osgeo.org/pipermail/proj/2023-August/011116.html

djhoese commented 1 year ago
$ docker run --rm -it osgeo/proj:latest bash
root@7aa0aa098f43:/# apt update && apt install python3-pip
root@7aa0aa098f43:/# python3 -m pip install pyproj
root@7aa0aa098f43:/# python3 -c "from pyproj import CRS; crs1 = CRS.from_proj4('+proj=laea +lat_0=90 +lon_0=0 +a=6371228.0 +units=m'); crs2 = CRS.from_epsg(3408); print(crs1.to_epsg() or "None"); print(crs2.to_epsg()); print(crs1 == crs2)"
None
3408
False
root@7aa0aa098f43:/# pyproj -v
pyproj info:
    pyproj: 3.6.0
      PROJ: 9.2.1
  data dir: /usr/local/lib/python3.10/dist-packages/pyproj/proj_dir/share/proj
user_data_dir: /root/.local/share/proj
PROJ DATA (recommended version): 1.14
PROJ Database: 1.2
EPSG Database: v10.088 [2023-05-13]
ESRI Database: ArcGIS Pro 3.1 [2023-19-01]
IGNF Database: 3.1.0 [2019-05-24]

System:
    python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
executable: /usr/bin/python3
   machine: Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35

Python deps:
   certifi: 2023.7.22
    Cython: None
setuptools: 59.6.0
       pip: 22.0.2

@avalentino This example above doesn't seem to produce an EPSG code. And just like in the pyresample issue it lists PROJ 9.2.1 even though I think it should be using PROJ from the system. Maybe?

avalentino commented 1 year ago

@djhoese to use libproj v9.3rc1 in debian you should enable "experimental" and then run:

# apt -t experimental  install libproj-dev libproj25 proj-bin proj-data

EDIT: moreover probably it is better to use "debian/sid" of "debian/sid-slim" as docker image

sebastic commented 1 year ago

Just install libproj-dev & proj-bin from experimental, the rest get pulled in via the dependencies.

djhoese commented 1 year ago

I was able to build a version of pyproj from source (github from main) and run the same command I showed above. Even the pyproj command showed it was using PROJ 9.3.0, but still that crs1 did not produce an EPSG code.

Then I used debian:sid, added the experimental repository (along with some certificate packages to get GPG to work), then installed libproj-dev and proj-bin. I then installed python3-pip and python3-pyproj. Running my example command from that showed that crs1 produced an EPSG code.

Then I uninstalled python3-pyproj, installed git, and did python3 -m pip install --break-system-packages --force-reinstall git+https://github.com/pyproj4/pyproj. And this too produced an EPSG code for crs1.

So...to me this means either:

  1. In my first test case using the PROJ docker image, I wasn't ever actually using PROJ 9.3.0 even though I'm pretty sure I was.
  2. There is a major difference between PROJ 9.3.0 in the PROJ images versus the experimental debian repos.
sebastic commented 1 year ago

There is a major difference between PROJ 9.3.0 in the PROJ images versus the experimental debian repos.

The PROJ images may have grids from PROJ-data installed which are not available in Debian.

djhoese commented 1 year ago

Hhhmmm any idea if/why that would cause a CRS to not resolve to an EPSG code? If they were missing then I'd guess that an EPSG couldn't be found, but this is the opposite.

snowman2 commented 1 year ago

While digging into this, consider:

crs2

Name: NSIDC EASE-Grid North Axis Info [cartesian]: - X[south]: Easting (metre) - Y[south]: Northing (metre) Area of Use: - name: Northern hemisphere. - bounds: (-180.0, 0.0, 180.0, 90.0) Coordinate Operation: - name: US NSIDC Equal Area north projection - method: Lambert Azimuthal Equal Area (Spherical) Datum: Not specified (based on International 1924 Authalic Sphere) - Ellipsoid: International 1924 Authalic Sphere - Prime Meridian: Greenwich ```
djhoese commented 1 year ago

Thanks @snowman2! I know about the PROJ.4 string warning, but that's why I'm confused with this behavior. It is close enough to be considered the same EPSG with to_epsg, but not when doing equality. I did not know about min_confidence, but I would have assumed that in cases like this it would default to a value that would handle the round trip decently well.

The original issue started in Pyresample and how it bounces between EPSG and PROJ.4 strings when exporting to on-disk formats. I think bumping up the min_confidence would be a "good enough" workaround.

I'm currently trying to build PROJ from source in the PROJ docker container and see if that gives me different results. I tried syncing the proj-data in the debian:sid image and that doesn't change the final result (crs1 still produces an EPSG code).

snowman2 commented 1 year ago

The min_confidence defaults to 70, which I would have assumed to be equivalent. If you need to be able to roundtrip, I would change it to 100. Alternatively, WKT2 is also a good format for storing a CRS if you need to roundtrip.

If you are confident that the CRS are equivalent, I can help you create an issue upstream with PROJ to ask what their thoughts are on it.

snowman2 commented 1 year ago

I tried syncing the proj-data in the debian:sid image and that doesn't change the final result

That is only for transfomations. proj.db is all you need for CRS information.

djhoese commented 1 year ago

The min_confidence defaults to 70, which I would have assumed to be equivalent. If you need to be able to roundtrip, I would change it to 100. Alternatively, WKT2 is also a good format for storing a CRS if you need to roundtrip.

Sounds good. I wouldn't say I need roundtripping in normal user use cases, but the pyresample tests assume it can be done sometimes. I understand WKT2 would be good for a roundtrip, but human readability/parse-ability is also something we'd like to have.

djhoese commented 1 year ago

The other question here is what is different between the two systems (debian:sid with experimental libproj-dev versus osgeo/proj:latest) that is causing an EPSG code to be used or not.

snowman2 commented 1 year ago

I don't recommend using osgeo/proj:latest as it doesn't work how you would expect: https://github.com/pyproj4/pyproj/pull/1085

djhoese commented 1 year ago

Hhhmmm in the 9.3.0 case it was showing the proper version.

djhoese commented 1 year ago

I haven't been able to figure out what's going on here. I installed PROJ from github on both systems and pyproj from github on both and pointing to the PROJ which was in a custom prefix in PROJ_DIR and I get an EPSG code for crs1 in the debian:sid container but not the osgeo/proj:latest which I think is Ubuntu-based.

Bumping up min_confidence from 70 to 71 causes there to be no EPSG code generated on the debian system.

djhoese commented 1 year ago

@snowman2 Do you have a suggested way of investigating the proj.db database? Or verifying that it is the same version between PROJ installations? I was about to post on the PROJ mailing list about this issue, specifically that one platform is getting an EPSG code and one is not. But then I realized I should know more about the proj_identify function used by pyproj and about the proj.db used in each installation.

https://proj.org/en/9.2/development/reference/functions.html#c.proj_identify

Is there a way to update the proj.db for an installation?

snowman2 commented 1 year ago

Or verifying that it is the same version between PROJ installations?

import pyproj

print(pyproj.show_versions())

or

python -m pyproj -v

Is there a way to update the proj.db for an installation?

That is not recommended as it may not be compatible with a different PROJ version.

djhoese commented 1 year ago

Well when I do md5sum /path/to/proj.db with installations from source (both from github and from the RC1 tarball posted on the mailing list) the md5sums are the same on a particular platform but different between them (debian versus the ubuntu-based osgeo/proj container). Any ideas what would lead to that? It would have to be some dependency or something available on one platform and not the other.

snowman2 commented 1 year ago

It would have to be some dependency or something available on one platform and not the other.

sqllite3 is the only one I can think of that would impact that.

djhoese commented 1 year ago

@snowman2 I just did:

rm -r /usr/local/lib/python3.10/dist-packages/pyproj/proj_dir

And now I'm getting the EPSG code in the crs1 case (from PROJ.4). When does proj_dir get created in the pyproj package and when does it use the system version?

djhoese commented 1 year ago

Ok so I'm going with a corrupt installation. We've got a workaround for the EPSG roundtrip inequality from Even here: https://github.com/OSGeo/PROJ/pull/3879

If/when that makes it through then the pyresample tests should pass with it. @snowman2 @avalentino I don't think I want to change the to_epsg confidence used in pyresample since the general purpose is to have it "magically" replace old PROJ.4 versions people were using that lacked datum information with some level of standard EPSG versions of these projections/CRSes. Thoughts?

snowman2 commented 1 year ago

When does proj_dir get created in the pyproj package and when does it use the system version?

This is what is included in the wheel. If you install pyproj with --no-binary pyproj ref it will compile using the system PROJ and won't include proj_dir.

snowman2 commented 1 year ago

I don't think I want to change the to_epsg confidence used in pyresample since the general purpose is to have it "magically" replace old PROJ.4 versions people were using that lacked datum information with some level of standard EPSG versions of these projections/CRSes. Thoughts?

I would be careful when "magically" replacing PROJ strings with EPSG versions. Unintended changes may occur that could result in differences in transformations. This is likely why Even is nervous about the change ref.

djhoese commented 1 year ago

Point taken. The main purpose of the code in pyresample is to gently replace users understanding of their PROJ.4 string that they've used for the last 10 to 20 years and to recognize there is something more accurate and explicit in at least some kind of standard. By that I mean, many users probably copied PROJ.4 strings from the EPSG site many years ago only because that's what we supported back then. Now that EPSG strings/codes are actually supported we're "helping" the user realize that.

I think I can close this and when things break I'll come back and ask you to save us.