jazzband / geojson

Python bindings and utilities for GeoJSON
https://pypi.python.org/pypi/geojson/
BSD 3-Clause "New" or "Revised" License
913 stars 121 forks source link

Feature coordinates accept numpy float64 but float32 is "not a JSON compliant number" #173

Closed 60south closed 1 year ago

60south commented 2 years ago

Hi. I'm new to GeoJSON and still learning Python.

I'm reading in coordinate data from various sources. Some arrives as numpy float32 type. Attempting to create a geojson point or polygon using float64 works fine, while float32 produces an error: "ValueError: 1.0 is not a JSON compliant number" (see example below).

The error is very confusing, and it took a long time to figure out what it was complaining about. It's easy enough to convert, but any floating point number should be valid. Thanks.

import geojson
import numpy as np

a = np.float64(1.0)
b = np.float64(2.0)
point = geojson.Point([a, b])

# Successful
print(point)

#------------------------------

a = np.float32(1.0)
b = np.float32(2.0)
point = geojson.Point([a, b])

# Produces error
trcmohitmandokhot commented 1 year ago

I've been staring at this behavior for a bit. I would like to take this on. However, this is my first time tackling an issue. At the very minimum, I would like to advance the conversation around whether what @60south saw was expected behavior and is compliant with RFC 7946.

60south commented 1 year ago

Thanks @trcmohitmandokhot

In defense of the GeoJSON module, the problem appears related to the cpython 'isinstance' function which is used on line 50 of the GeoJSON geometry.py function.

Specifically, there is an inconsistency in how isinstance() treats numpy floats:

import numpy as np isinstance(np.float64(1), float)

returns True, even though it's actually of type <class 'numpy.float64'>, not 'float', whereas:

import numpy as np isinstance(np.float32(1), float)

returns False. Ditto for isinstance(np.float128(1), float), which should also be True.

I'm too new at this to say whether this is a problem with isinstance() or the numpy implementation of floats, or not really a problem at all and I'm just a whiner, but it does seem kinda bogus: a float is a float, we're just dickering over precision.

It would be easy enough to import numpy into GeoJSON and test for those types, but then that adds numpy as a dependency to GeoJSON which it shouldn't need. Ugh.

The best suggestion I have is to use the numbers module, i.e., instead of:

elif isinstance(coord, (float, int, Decimal)):

do this:

import numbers ... elif isinstance(coord, (numbers.Real, Decimal)): ...

If a Decimal is a Number, then the statement could be shortened even further.

I'm sure someone with more knowledge than I will shred this idea, but it's the best I got.

thanx.

trcmohitmandokhot commented 1 year ago

@60south. While, I don't yet know what to do about the issue, I looked some more into the numpy library and this is what I found.
I think the built-in python library function isinstance() is doing its job right. However, the definition of numpy.dtypes appears to be inconsistent. numpy.double The numpy.double (which on Linux x8664 is numpy.float64) is an alias of numpy.float. The numpy.float_ is a built-in scalar data-type, which inherits from python's float. Built-in Types Hence your insinstance(np.float64(1),float) returns True.
However, all other numpy.dtypes i.e. numpy.float32 and numpy.float128 are custom data types, which do not have Python equivalents. Hence calls to isinstance() fail and you run into the "ValueError: 1.0 is not a JSON compliant number".
I will look at the code-logic next to see if using the numbers module or Decimals shows some clues to solving this interesting problem.

trcmohitmandokhot commented 1 year ago

There appear to be some previous interesting numpy issues, which also relate to this dtype inconsistency topic. I am attaching links here for some side-reading. Data type precision problems? #5272

rayrrr commented 1 year ago

Thanks for reporting this, @60south. In section 3.1.1 of RFC 7946 it says (emphasis added):

A position is an array of numbers. There MUST be two or more elements. The first two elements are longitude and latitude, or easting and northing, precisely in that order and using decimal numbers. Altitude or elevation MAY be included as an optional third element.

This implies that checking that the coordinates are decimal numbers is the intended behavior for this logic. @trcmohitmandokhot I encourage you to submit a PR to resolve this bug, which I'd be happy to review. Here are some breadcrumbs:

  1. The LoC causing the exception is https://github.com/jazzband/geojson/blob/d8480656b17e914166f7e4abd5dbc5f19c229de3/geojson/geometry.py#L50

  2. https://github.com/numpy/numpy/issues/13133#issuecomment-473567580 offers a potential solution

trcmohitmandokhot commented 1 year ago

Thank you for the guidance. I will take this up.

rayrrr commented 1 year ago

One other note on the proposed solution, @trcmohitmandokhot. It uses numpy. We would rather not add numpy as a dependency. Let's try to find a way to fix it without numpy.

60south commented 1 year ago

@rayrrr, thanks. I agree that numpy should not be a dependency. Rather, we need an expansive, holistic solution where any Real floating point number passes the test, as it should.

I am cautious about equating the traditional RFC 7946 definition of decimal (i.e., not a fraction) with the python Decimal class, since I don't think they mean the same thing. In fact, looking at the Decimal class definition, I see they actually warn against treating Decimals as Real numbers:

Do not subclass Decimal from numbers.Real and do not register it as such (because Decimals are not interoperable with floats). See the notes in numbers.py for more detail.

That shouldn't preclude their use as coordinates, of course.

After some further testing, it looks like any numpy.floatXX, numpy.intXX, and numpy.uintXX, etc., do pass the isinstance(x, numbers.Real) test, so they apparently inherit from the Numbers abstract base class, but not the Decimal class. Complex numbers fail the isinstance() test, as they should.

Seems like: elif isinstance(coord, (numbers.Real, Decimal)): should work.

Regardless, I'm very happy to see this issue getting some attention!

trcmohitmandokhot commented 1 year ago

I agree with @60south's and @rayrrr's analysis and suggestions.
The recommendation to stay away from adding numpy as a dependency is a valid point. The suggestion to handle this via Numbers abstract base class seems to me to be the most reasonable. I did some more testing for my own documentation purposes and here is summary of results to support @60south's proposed PR change.

PEP 3141 lays out a really nice Type Hierarchy for Numbers, where Number :> Complex :> Real :> Rational :> Integral. PEP also acknowledges that Decimal does not fit into the hierarchy.

Results of testing are tabulated below.

Current type-check Proposed type-check
Input (var) Type Class per PEP 3141 Hierarchy isinstance(coords,(float,int,Decimal)) isinstance(coords,(numbers.Real,Decimal))
x = 1 int Integral Number True True
x = np.uint(1) numpy.uint64 Integral Number False True
x = 1.0 float Real Number True True
x = Decimal(1.0) decimal.Decimal Number True True
x = np.float32(1.0) numpy.float32 Real Number False True
x = np.float64(1.0) numpy.float64 Real Number True True
x = np.float128(1.0) numpy.float128 Real Number False True
rayrrr commented 1 year ago

Fix included in version 3.0.0