locationtech / proj4j

Java port of the Proj.4 library for coordinate reprojection
Other
181 stars 71 forks source link

Proj4j should not use the Apache License if it contains the EPSG data set #90

Closed julianhyde closed 1 year ago

julianhyde commented 1 year ago

Proj4j should not use the Apache License while it continues to contain the EPSG data set. The licensing terms are not compatible. Use of the EPSG data set implies the user's acceptance of EPSG's conditions (such as 'no commercial use'), but the Apache License implies that there are no such conditions.

You can't say 'the users should have read the fine print', because the whole point of the Apache License is that there is no fine print.

In Apache Calcite we used Proj4j, and took at face value Proj4j's assertion that it was under the Apache license. In so doing, we have placed our users in legal jeopardy because they may have used Calcite's ST_Transform function for commercial purposes. See CALCITE-5399 for more details.

I think it would be irresponsible for the authors of Proj4j to continue to license under Apache License. Everyone who uses Pro4j is getting a time bomb, and may not know it. (Sure, it sucks that EPSG is not available under an open source license. But you can't hide that fact under a layer of software abstraction.)

May I suggest the following solution. Split the EPSG library into a separate library (jar file) that is NOT under an open source license, say proj4j-data. Keep the rest of the library the same, but make the dependency optional. If the user explicitly downloads proj4j-data and places it on the classpath (thereby signaling their consent to the terms), then proj4j will work as normal. If they do not, pro4j will work in a reduced capacity. Maybe that capacity is massively reduced, I don't know.

Another possible solution. If the discussion on CALCITE-5399, Martin Desruisseaux suggests that Proj4j implements GeoAPI interfaces. If so, downstream projects like Calcite could depend on only those interfaces, and the user could download and "plug in" as an implementation of those interfaces. Calcite would ship with a 'dumb' implementation of those interfaces that did not use the EPSG data set, but would contain instructions on its web site for people to download Proj4j if they agreed with the EPSG conditions.

echeipesh commented 1 year ago

Catching up with the discussion in the links you have posted. For now here are the links to the IP review tickets under which EPSG datasets have been approved by Eclipse legal review:

Name: EPSG Geodetic Parameter Dataset Version: 7.9 Description: The EPSG database provides common identifiers for measurements (size and shape of the earth), mathematical models (of the shape of the earth), common reference points (such as the prime meridian) and units (of distance) required for the use of geospatial information.

To facilitate communication between GIS systems (and alleviate an long history of loss of property and life) these identifiers and definitions are required for any and all interoperability between systems. To go further these identifiers and definitions are required to responsibly use any and all data published or information captured.

These definitions are often the responsibility of individual industry, national and international authorities. The assignment of an identifier is the responsibility of the International Association of Oil & Gas Producers (OGP). Formally European Petroleum Standards Groups (EPSG).

This standards body publishes their registry as an access database, with terms permitting commercial use.

The terms restrict \"distribution for profit\" for the identifiers as they form a freely available standard the wide distribution of which is in the interest of safety. Distribution in a commercial context requires that value charged be based on the software, and not by virtue of including this freely available dataset.

The identifiers and definitions are used by a GIS system in a similar fashion as Metric or Imperial systems of measurements provided by the UOMo project.

Cryptography: No

License(s): Historical Permission Notice and Disclaimer Other Project URL: http://www.epsg-registry.org/

My reading of the above is precisely because the projects themselves add functionality, coordinate reprojection, and do not try to extract commercial value from the database itself they are within the terms of use for EPSG. This would also necessarily be the case for any project that use Proj4J or GeoMesa as a dependency since they would be adding further functionality.

Also, paging @jodygarnett and @jnh5y for deeper answers and insights.

desruisseaux commented 1 year ago

Yes, PROJ4J is allowed to bundle the EPSG dataset. This is not the issue. The issue is that the license shall be Apache + EPSG Terms of Use, not Apache alone. Then, it is up to projects using PROJ4J to decide what to do with the "EPSG Terms of Use" part of the license. In the particular case of the Apache Software Foundation, this is classified as Category X, which means that EPSG data can not be part of official Apache releases (but can be part of releases made by users of Apache releases).

Eclipse has approved the use of EPSG Terms of Use in PROJ4J, indeed maybe on the basis that PROJ4J add a value that free it from the "no commercial use" clause. However this is an authorization to use, not an authorization to relicense. Nobody can relicense except the copyright owner.

In summary the issues are:

jnh5y commented 1 year ago

From what I can see, @julianhyde and @desruisseaux are correct here.

Eclipse's permission is to depend on the EPSG dataset. Eclipse cannot give permission to relicense the dataset, so the project should either update the license to reflect the current situation or as suggested spin out the data into a separate jar which can be used or not as desired.

In terms of interfaces, the GeoAPI interfaces are an option. One could also get the EPSG data from various GeoTools jars (which admittedly are LGPL licensed + EPSG; I imagine Apache projects which be in the same situation of needing to provide scripts to get them without bundling the jars themselves).

pomadchin commented 1 year ago

Just to clarify, what portion of these files gets under the EPSG license? All of them?

I see no technical issues in splitting the project into two: proj4j and proj4j-epsg.

desruisseaux commented 1 year ago

Files containing EPSG data are listed below. It can be verified by entering a code (first column) in https://epsg.org/ and comparing the result with the data in the CSV file. A different tab needs to be selected depending on the file. I put the tab name in italic.

I'm not sure what src/main/resources/pcs.override.csv is. But it seems to be a few modifications applied on pcs.csv, so it may be safer to keep with it. Same would apply to gcs.override.csv but since that file seems practically empty anyway, it would not matter where it is located.

Regarding GeoTools, the situation is the same. License shall be (whatever GeoTools choose) + EPSG Terms of Use, unless they provide EPSG data in a separated JAR. Same shall apply to PROJ (the C/C++ library) as well.

julianhyde commented 1 year ago

If you are able to remove all EPSG files from the Proj4J jar file, as in @pomadchin's PR, I believe that would solve Calcite's problems.

Thank you for finding a solution. As soon as there is a Proj4J release available with the EPSG files removed we would love to incorporate it in a Calcite release.

I apologize if my words were harsh. I know that we are all acting in good faith, doing our best to make awesome software available under a permissive open source license.

pomadchin commented 1 year ago

@julianhyde I'm waiting for some extra thumbs up, your review is very much appreciated as well!

I'll cut a release once we're all acknowledged with the PR :+1:

jodygarnett commented 1 year ago

This issue is a real pain as the above is data available under one of the first open data licenses. With the key distinction that additions should not claim to be from the EPSG authority (many national governments make additional codes, the open geospatial consortium makes one called CRS:84 for example for WGS84 is lon/lat order).

Consider this a data license; and not a software license?

jodygarnett commented 1 year ago

The EPSG no commercial use is not strictly true; The goal is to include this dataset is as many GIS applications as possible. They just do not want you charging your customers additional money for the inclusion of the dataset (which you obtained for free).

aside: We had a difficult time and got an exemption from the eclipse foundation to distribute these files. One reason it is difficult is that the license is so very old it does not have a lot in common with modern data licenses.

desruisseaux commented 1 year ago

Indeed, this is a kind of data license. But this issue does not question the right to distribute EPSG data. It just said that any software distributing EPSG data must include EPSG Terms of Use in their list of licenses. Because not every software foundation accept those Term of Use (Apache does not at this time), the ability to separate EPSG data is a convenience for them.

@pomadchin just curious: if PROJ4J does not use those CSV files, where does it takes its data when e.g. the "EPSG:3395" CRS is requested?

pomadchin commented 1 year ago

@desruisseaux https://github.com/locationtech/proj4j/pull/92 contains the whole split; proj4j relies on nad files only https://github.com/locationtech/proj4j/tree/master/src/main/resources/proj4/nad

I'm having troubles finding out are these files okay; could not find any metnions of them on the epsg website, are these files clear? I've seen contribution into them by proj4 contributors, adjusting and adding misssing data i.e. here: https://github.com/OSGeo/PROJ/tree/5.2.0/nad

desruisseaux commented 1 year ago

It does not seems to be EPSG data (or at least I do not recognize them). I do not know the provenance of those files, but "NAD27" and "NAD83" suggests that they are North American Datum 1927 and 1983, which are defined (I think) by U.S. National Geodetic Survey, a U.S. federal agency. As such, those data should be in public domain.

pomadchin commented 1 year ago

Proj4j 1.2.0 is released with no epsg files in the resource folder; all the old epsg files are in the proj4j-epsg module now.

Let me know if it does not resolve this issue and / or feel free to reopen it / create a new one. I'm closing it for now!

desruisseaux commented 1 year ago

Thanks. Just for the record (in case not everyone is familiar with the relationship between those two projects), it resolves the issue for PROJ4J. It is independent of the PROJ project, for which the issue is still open at my knowledge (it was raised on their mailing list maybe one or two years ago).

julianhyde commented 1 year ago

Thank you @pomadchin! We have logged CALCITE-5417 to migrate to the stripped-down proj4j 1.2.0 artifact, and restore it to a runtime dependency under the Apache License, and expect to do it shortly.

julianhyde commented 1 year ago

@pomadchin There's a problem with the 1.2.0 release. The version in the pom file deployed to maven central contains the version 1.2.0-SNAPSHOT, not 1.2.0.

See https://search.maven.org/remotecontent?filepath=org/locationtech/proj4j/proj4j/1.2.0/proj4j-1.2.0.pom and also https://central.sonatype.dev/artifact/org.locationtech.proj4j/proj4j/1.2.0-SNAPSHOT/versions

This caused problems when i tried to upgrade Calcite to use it: https://github.com/apache/calcite/actions/runs/3666724770/jobs/6198671745