r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 298 forks source link

should sf_use_s2 be `FALSE` by default? #2141

Closed eblondel closed 1 year ago

eblondel commented 1 year ago

I'm wondering the rationale behind having sf_use_s2 set to TRUE by default when loading sf:

Thanks in advance

edzer commented 1 year ago

Hi Emmanuel, great question.

First, this has been introduced later. For backward compatibility, it would make more sense to have it set FALSE by default.

that attitude would put new users, unaware of history, in the "old", flat Earth / GIS mode, without good reason.

This is technology independent

sf is an implementation, and an implementation can never be technology independent. The "traditional" open source SF implementation has been (and still is) JTS, which has been translated into C++ as GEOS. In the sp days, rgeos was the GEOS backend doing geometry operations, in sf the links to GEOS were integrated (for maintenance reasons, essentially). I don't think that anything in ISO 19125 tells you that the area of POLYGON((0 89,1 89,1 90,0 90,0 89)) where coordinates are long/lat geodetic associated with OGC:CRS84, should be 1 (eh, one what?), yet that is what GEOS gives you (it's CRS unaware). Go ask it to draw a buffer with size 1 around that polygon, and try to explain the outcome to a newcomer in spatial data science.

Before 1.0, sf has tried to do distances and areas for geodetic coordinates correctly, using the routines in GeographicLib, but buffers and intersections would still be "wrong" (flat Earth). s2geometry provides an alternative to GEOS that doesn't assume the Earth is flat, but does all computations on the sphere, which is a more realistic approximation of the Earth's surface than the flat plane. s2geometry is an open source library, just like GEOS, and is actively used by Google and maintained by its engineers.

The reason that s2 is an independent R package is purely practical; GEOS, GDAL and PROJ could have been set up as independent packages. sf would surely have used package geos if it had been available in 2016.

it would be better to have sf_use_s2 set to FALSE by default.

See also Chrisman's quote used as the motto of this chapter. Could you describe what you think would go better if we'd continue to propagate the GIS worldview that geodetic coordinates should be treated as Cartesian coordinates?

eblondel commented 1 year ago

Thanks for your answer. On the backward compatibility: This "attitude" (how you call it) is to maintain a software behavior identical in time, which is a minimum requirement in software engineering, because the current users have been building workflows on top of it, and some break because the introduction of s2 as default broke the compatibility with previous behaviors. They are not only newcomers, they are also oldcomers that build software and analyses on top of sf. I don't think this an issue with new users, as long as you inform them of limitations, ways to mitigate it for specific use cases (working global scale), and at least set a transition phase where you progressively introduce a feature as plugin, setting it to FALSE by default, and maybe after make it default, and let people get aware of that and adapt consequently. But maybe this is what you did, and i might have missed this transition throughout sf releases.

For the rest, I still think there is a confusion, between the name of the package, which makes completely sense within the frame of the standards (since the package has the name of the standard), and what has been extended with s2, in the sence that some processes, that do not fail based on standard-compliant libraries, fail with s2; because - maybe - their data model behind does not rely on the ISO/OGC standard. If you read carefully my post, you will see i'm not questioning s2 capability.

I may share some of your arguments related to data projection at global scale, but GIS and spatial data science is far from being bound to global scale, and the entire GIS community extensively uses metric data projections, and for good reasons. Personnally I don't want to pretend question main fundaments of the GIS science, just because of the global use case (that I know very well because I practice it through international organizations), and as for newcomers, they learn spatial data science, and part of this learning is also made of specific data handling, and use cases they will discover in time.

Cheers

edzer commented 1 year ago

I may share some of your arguments related to data projection at global scale, but GIS and spatial data science is far from being bound to global scale,

It's not only global scale data, it's also all data that is close to the poles, data crossing the antimeridian, and directional problems (e.g., computing buffers or distances) further away from the equator.

and the entire GIS community extensively uses metric data projections, and for good reasons.

sf_use_s2(TRUE) didn't change any of that, and I've never discouraged anyone doing that, I only discouraged the implicit use of plate carree or equirectangular projections when data are not projected. If you want that, you can do it, by using st_transform() or sf_use_s2(FALSE). I don't think it's good as default (although it still is - but that might change - for plot.sf() -- e.g. pkg tmap has a more sensible defaults).