pyproj4 / pyproj

Python interface to PROJ (cartographic projections and coordinate transformations library)
https://pyproj4.github.io/pyproj
MIT License
1.05k stars 211 forks source link

PROJ_DATA env var should take precedence over installation data #1448

Open kidanger opened 2 weeks ago

kidanger commented 2 weeks ago

Hello,

Currently, setting the environment variable PROJ_DATA has no effect on pyproj when the installation of pyproj brings its own data. I think it would be good to lower the priority of the internal data, and let users override the proj data with the environment variable in more cases.

Example: (from a fresh virtual env, python 3.12)

$ pip install pyproj
...
Successfully installed certifi-2024.8.30 pyproj-3.7.0
$ # create a custom proj data dir, here just a copy of the default one
$ cp -r .venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj test/

$ # without env var, pyproj finds the its own data directory
$ pyproj -v
pyproj info:
    pyproj: 3.7.0
PROJ (runtime): 9.4.1
PROJ (compiled): 9.4.1
  data dir: /tmp/t/.venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
...

$ # even with the env var, it uses its own directory
$ PROJ_DATA=test/ pyproj -v
...
  data dir: /tmp/t/.venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
...

$ # remove the internal dir manually, now it works
$ rm -fr .venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
$ PROJ_DATA=test/ pyproj -v
...
  data dir: test/
...

(related discussion: https://github.com/NixOS/nixpkgs/pull/282139)

snowman2 commented 2 weeks ago

This is by design. The reason this is the case is to prevent using the PROJ_DIR for a different PROJ installation that is incompatible. The PROJ database must be the one provided for that specific PROJ version and should not be interchanged.

If you have a separate PROJ installation, you should install pyproj from source instead of from a wheel if that is what you would like to use.

https://pyproj4.github.io/pyproj/stable/api/datadir.html

kidanger commented 2 weeks ago

Thank you for the fast answer.

Then I'm not sure why pyproj.datadir.set_data_dir would have precedence over pyproj internal data but PROJ_DATA doesn't, but I don't know all the details of pyproj and proj. Maybe this is not the goal of PROJ_DATA. My use-case is to bundle specific datum grids during the distribution of a software, to avoid network downloads or relying on user folders.

Feel free to close the issue, if the behavior in intended.

snowman2 commented 2 weeks ago

I'm not sure why pyproj.datadir.set_data_dir would have precedence over pyproj internal data but PROJ_DATA doesn't

The reason set_data_dir exists is to set the data directory if it cannot be found automatically. It is guaranteed to be for the specific instance of pyproj and not for another installation of PROJ.

With multiple installations of PROJ on a single machine, PROJ_DATA could potentially point to an incorrect directory that shouldn't be used by pyproj.