jazzband / geojson

Python bindings and utilities for GeoJSON
https://pypi.python.org/pypi/geojson/
BSD 3-Clause "New" or "Revised" License
913 stars 121 forks source link

Default Precision Issues #135

Closed youngpm closed 2 years ago

youngpm commented 5 years ago

I understand the rationale with the default precision in v2.5 for those that want smaller geojson files, but I'm curious why the default isn't to behave as in previous versions? This change isn't backwards compatible in the sense that data read/written with 2.4 will come in different in 2.5, so at a minimum I suppose the version should have bumped to v3.

Is there opposition to changing the default precision to be backwards compatible and those that want it truncated can opt in?

rayrrr commented 5 years ago

@youngpm thanks for your input. I will say the default precision feature was added intentionally. https://tools.ietf.org/html/rfc7946#section-11.2 makes a strong argument for defaulting to 6 decimal places in GeoJSON and as the new project lead I decided to go along with it. I would prefer to keep it this way rather than change it to opt-in.

As far as the version number, yes, this should have been a major version bump...I did not at first consider "identical to the old precision" program outputs to be under the auspices of backwards compatibility, but I see your point. I will be more careful with semantic versioning going forward. I foresee a v3 to be released soon; I hope that can mitigate the inconvenience.

youngpm commented 5 years ago

@sgillies I would be interested in your opinion, esp as you led the charge on the RFC! Would such a change make sense in fiona?

sgillies commented 5 years ago

@youngpm I don't normally consider a change of precision in string output of a floating point number to be a breaking change. Fiona provides a method to round output but doesn't round by default. It's not strict on RFC 7946.

youngpm commented 5 years ago

I guess what bothers me about this change is that you can find yourself in the situation where, by default, upon outputting a geometry the geometry can become invalid (e.g. introduce a crossing or some such). With the opt in style, you're at least aware that you're fiddling with the values.

sgillies commented 5 years ago

@youngpm this is a good point. I wish I had a good, cheap, all-purpose solution for this. If we always used a fixed precision model we can write out coordinates with no extra precision and no less precision. But if we're using floating point precision the best we can do is limit our precision to the smallest distance between nodes, right? And that's sorta expensive to compute.

youngpm commented 5 years ago

@sgillies Reminds me of this blog on GEOS 3.8 and their new "overlay" engine to handle tricky edge cases.

http://blog.cleverelephant.ca/2019/08/postgis-3-geos.html

rayrrr commented 5 years ago

@youngpm I want to make sure you understand the motivation. You may already know all this, but here we go.

A 6 decimal place precision was chosen intentionally because of potential space savings. Not just the incremental space savings in a JSON string gained by reducing the size of numbers with greater than 6 decimal places.

Rather, there's something very special about this threshold. Any lat/lon number with up to 6 decimal places of precision will fit in 32 bits. With any more, you need 64. These are known as "single" and "double"precision floats respectively. So, a number with 7 or more decimal places needs twice as much storage space as a number with 6 or less.

This is all explained in depth in IEEE 754.

Python tends to use doubles for storage more often than not, but speaking in terms of interoperability, any other system that implements single/double floats separately (from numpy to the C language and many in between) could benefit from a whopping 50% reduction in storage requirements using data with 6 decimal place precision, the new default precision in this library.

I hope that helps a bit; let us know.

youngpm commented 5 years ago

@rayrrr I get it, and I think it is great that you can now control the precision when you dump geometries. If you're just schlepping geometries across the network so you can view them, it makes a lot of sense to half the packet size.

Let me give an example of why I don't like your new default behavior. Take a web mercator tile, say (x,y,z) = (0,0,1). If you calculate the bounds of the tile, here's what you'll see:

mercantile.ul(mercantile.Tile(x=0, y=0, z=1))
>>>  LngLat(lng=-180.0, lat=85.0511287798066)

geojson.Point(mercantile.ul(mercantile.Tile(x=0, y=0, z=1)))
{"coordinates": [-180.0, 85.051129], "type": "Point"}

Now imagine I'm doing some geometry operations where i'm intersecting geometries against these web mercator tile boundaries. If I dump the outputs of my operations to disk with the geojson package and use the new truncating behavior, the serialized coordinates now have half the precision of what was in memory, by default. If I were to read back in the geometry and do subsequent geometry operations (say I use mercantile again to get double precision coordinates of web mercator tiles but am working with these truncated coordinates from what I read back in), I'll potentially get all sorts of weird slivers and whatnot.

Computational geometry packages like GEOS (what shapley wraps) work in double precision; if one is using this package and unwittingly is getting values truncated that pass through it, you'll begin to encounter strange geometry issues (invalid geometries, geometry unions with sliver artifacts, etc). Python and JS don't support single precision data types (numpy aside), so the savings is only in disk storage and data movement; you'll still consume the same amount of memory.

My point is that there's more to precision than saying 6 digits after the decimal is mm level accuracy; it still will cause problems for folks if they're actually computing against these geometries. This is how I came across the change you made; a test stopped passing because the serialized geometry became invalid with v2.5.

rayrrr commented 5 years ago

@youngpm you make a compelling case. Thank you for the real-world example and for your persistence.

Since interoperability was my initial motivation...I agree that interoperability with web mercator tiles is right up there with interoperability with single-precision float situations. And we live in an increasingly 64-bit world. Reopening this issue; planning to remove the 6 decimal place default in the next incremental release.

bk-mtg commented 4 years ago

One other potential consideration here is the ability to set precision at a higher level than the geometry object. When we ran into this issue here today, we were considering how to deal with it, but the problem is that our codebase creates geojson geometry objects on probably a couple of hundred different lines, and if we needed to go through and update every single one of those to include the increased precision needed for our application, it would be quite a lot of work. Perhaps there's a way to have a "default precision" that can be set at a module (file?) level or something, rather than (or in addition to?) each geometry object? Or maybe it makes sense to only enforce precision when actually encoding to a file, so that it could be an argument to geojson.dump? I guess downsides of that would be that the precision really is a property of the data rather than the storage, and that you'd lose argument compatibility with json.dump...

aj-hitchins commented 4 years ago

Precision is not exposed at the Feature level - Feature will truncate the coordinates if you're passing a shape (such as shapely.geometry.Point) as the geometry:

from geojson import Feature
from shapely.geometry import Point

Feature(geometry=Point(0.43242423423, 0.3443443))
# {"geometry": {"coordinates": [0.432424, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}

Therefore, regardless of whether there is a default of 6 or not, there is no control over specifying a precision at this level. Would you consider exposing the precision at a Feature level or is it expected that geometry objects should be build based on type such as:

from shapely.geometry import Point
from geojson.geometry import Point as geojsonPoint

point = Point(0.43242423423, 0.3443443)
geojsonPoint(point.coords, precision=10)
# {"coordinates": [[0.4324242342, 0.3443443]], "type": "Point"}

Happy to open a new request if this is not the correct place.

rayrrr commented 4 years ago

@aj-hitchins Thank you for pointing that out. I think we would all like to specify precision everywhere but by default have it do nothing. If you have the skills to do that PR, you are welcome! I will review it. Otherwise I or someone else can give it a try when we have time.

aj-hitchins commented 4 years ago

@rayrrr Thanks, I'll give it a go when I have some time.

Rumpelstinsk commented 4 years ago

geojson.Point((coordinates[0], coordinates[1]), precision=15) allows me to set the precision when I'm creating a point.

But how can I set precision when I'm reading a provided geojson? I mean, there is no "precision" argument on: geojson.loads(my_string_content)

aj-hitchins commented 4 years ago

@Rumpelstinsk My understanding is that in the current version here, this is not possible as this operation is also handled by the class method here: base.py::L115 which only creates the geometry object with properties that have come from the geojson dict, therefore if you input a precision key to the geometry of your geojson, you can maintain the precision like this:

geojson.loads('{"geometry": {"precision": 10, "coordinates": [0.432424343243, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}')

which maintains the point precision below:

{"geometry": {"coordinates": [0.4324243432, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}

Where as loading without the precision property will truncate the coordinates:

geojson.loads('{"geometry": {"coordinates": [0.432424343243, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}')

Which leaves you with the default precision of 6:

{"geometry": {"coordinates": [0.432424, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}

Annoyingly, the precision can't be set at any other level at the moment (Please correct me if I'm wrong!), therefore each feature in your collection would need to have this property injected into the "geometry" object. You could get around this by doing something like:

import json

string = '{"type": "FeatureCollection", "features": [{"geometry": {"coordinates": [0.432424343243, 0.34434444444], "type": "Point"}, "properties": {}, "type": "Feature"}]}'

data = json.loads(string)

for feature in data["features"]:
    feature["geometry"]["precision"] = 10 

encoder = geojson.GeoJSONEncoder()

geojson.loads(encoder.encode(data))

Which would give you:

{"features": [{"geometry": {"coordinates": [0.432424343243, 0.34434444444], "type": "Point"}, "properties": {}, "type": "Feature"}], "type": "FeatureCollection"}

Note that this returned data no longer has a precision property in the geometry.

It depends on what you're doing with it afterwards, as you would already have it loaded into json at that point.

With regards to performance, I haven't tested this and am not sure how many times you would be performing this operation so I suggest you take that into account as well.

Rumpelstinsk commented 4 years ago

@aj-hitchins Thanks for the response. I will test it layer, it seems a valid workarround at the moment. However it's not perfect. It has to parse the same file twice (three times if I have to check it's a valid geojson, using geojson.loads(string).is_valid before injecting the property)

It would preferrable, as @rayrrr said, if I could set precision on loads, as the same way, we do on new Point(...) and others.

meseta commented 4 years ago

@aj-hitchins @Rumpelstinsk an alternative workaround is to monkey patch the default precision argument in the Geometry class using.

import geojson
geojson.geometry.Geometry.__init__.__defaults__ = (None, False, 10)

This affects all subsequent instantiation of Geometry, including its children like Point, and classes that use it like Feature and FeatureCollection

Rumpelstinsk commented 4 years ago

Thanks for the workaround @meseta . I think i prefered that one. I will try it

freespace commented 4 years ago

@meseta thank you for the monkey patch! It works for me and now I am not losing precision on exporting geometry from postgis any more.

In my instance we are generating high resolution orthomosaic against cm-level GCPs and 6 digits of precision wasn't cutting it.

rogerlew commented 2 years ago

I wanted the ability reduce the precision of geojson files for a web interface. I added a DEFAULT_PRECISION variable in geometry that gets assigned when the precision is not specified. https://github.com/rogerlew/geojson/blob/master/geojson/geometry.py

It seems to do the trick of allowing precision to be set more broadly:

image

Not sure if this would be helpful to pull but you are welcome to it.

nellessen commented 2 years ago

I guess what bothers me about this change is that you can find yourself in the situation where, by default, upon outputting a geometry the geometry can become invalid (e.g. introduce a crossing or some such). With the opt in style, you're at least aware that you're fiddling with the values.

The behaviour of automatically rounding coordinates caused a lot of trouble. I did not expect that at all and I don't like this default behaviour at all.

Here is an example of a polygon that gets invalid after parsing it with geojson:

{
   "coordinates":[[[[7.30578007,51.299653652],[7.306700226,51.299969746],[7.306699822,51.299970199],[7.30578007,51.299653652]]]],
   "type":"MultiPolygon"
}

which becomes this with default precision 6:

{
   "coordinates":[[[[7.30578, 51.299654], [7.3067, 51.29997], [7.3067, 51.29997], [7.30578, 51.299654]]]],
   "type":"MultiPolygon"
}

BigQuery through an error on the rounded one:

ST_GeogFromGeoJSON failed: Polygon loop should have at least 3 unique vertices, but only had 2
nellessen commented 2 years ago

I wanted the ability reduce the precision of geojson files for a web interface. I added a DEFAULT_PRECISION variable in geometry that gets assigned when the precision is not specified. https://github.com/rogerlew/geojson/blob/master/geojson/geometry.py

It seems to do the trick of allowing precision to be set more broadly:

image

Not sure if this would be helpful to pull but you are welcome to it.

Nice idea. I was also surprised that you cannot configure the default precision. And I am really surprised that the is not one word about precision in the docs!

https://python-geojson.readthedocs.io/en/latest/search.html?q=precision&check_keywords=yes&area=default

joe-jordan commented 2 years ago

I've suggested a fix for this, using Roger Lew's patch that he mentions above at PR #177 . It's currently stuck as I'm a new contributor, this is just a bump in case someone can take a look, thank you.