ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.24k stars 591 forks source link

bug: broken geospatial `d_within` function #7427

Closed ncclementi closed 1 year ago

ncclementi commented 1 year ago

What happened?

The d_within function is supposed to accept distance (a float value), but it breaks with integers and floats.

Integer behavior

import ibis
t = ibis.table({"geo1": "geometry", "geo2": "geometry"}, name="t")
t.geo1.d_within(t.geo2, 3) # passing an int 
Traceback
```python-traceback ... SignatureValidationError Traceback (most recent call last) Cell In[8], line 1 ----> 1 t.geo1.d_within(t.geo2, distance=5) File ~/Documents/git/my_forks/ibis/ibis/expr/types/geospatial.py:198, in GeoSpatialValue.d_within(self, right, distance) 179 def d_within( 180 self, 181 right: GeoSpatialValue, 182 distance: ir.FloatingValue, 183 ) -> ir.BooleanValue: 184 """Check if `self` is partially within `distance` from `right`. 185 186 Parameters (...) 196 Whether `self` is partially within `distance` from `right`. 197 """ --> 198 return ops.GeoDWithin(self, right, distance).to_expr() File ~/Documents/git/my_forks/ibis/ibis/common/bases.py:73, in AbstractMeta.__call__(cls, *args, **kwargs) 54 def __call__(cls, *args, **kwargs): 55 """Create a new instance of the class. 56 57 The subclass may override the `__create__` classmethod to change the (...) 71 The newly created instance of the class. No extra initialization 72 """ [150/1167] ---> 73 return cls.__create__(*args, **kwargs) File ~/Documents/git/my_forks/ibis/ibis/common/grounds.py:118, in Annotable.__create__(cls, *args, **kwargs) 115 @classmethod 116 def __create__(cls, *args: Any, **kwargs: Any) -> Self: 117 # construct the instance by passing only validated keyword arguments --> 118 kwargs = cls.__signature__.validate(cls, args, kwargs) 119 return super().__create__(**kwargs) File ~/Documents/git/my_forks/ibis/ibis/common/annotations.py:492, in Signature.validate(self, func, args, kwargs) 489 this[name] = result 491 if errors: --> 492 raise SignatureValidationError( 493 "{call} has failed due to the following errors:{errors}\n\nExpected signature: {sig}", 494 sig=self, 495 func=func, 496 args=args, 497 kwargs=kwargs, 498 errors=errors, 499 ) 501 return this SignatureValidationError: GeoDWithin(r0 := UnboundTable: t geo1 geospatial:geometry geo2 geospatial:geometry dist float64 geo1: r0.geo1, r0 := UnboundTable: t geo1 geospatial:geometry geo2 geospatial:geometry dist float64 geo2: r0.geo2, 5) has failed due to the following errors: `distance`: 5 is not coercible to a Value[Floating, DataShape] Expected signature: GeoDWithin(left: Value[GeoSpatial, DataShape], right: Value[GeoSpatial, DataShape], distance: Value[Floating, DataShape]) ```

If instead we force distance to be a float t.geo1.d_within(t.geo2, 3.0)

we get a different error

Traceback
```python-traceback ... File ~/Documents/git/my_forks/ibis/ibis/expr/types/core.py:76, in Expr.__repr__(self) 74 def __repr__(self) -> str: 75 if not opts.interactive: ---> 76 return self._repr() 78 from ibis.expr.types.pretty import simple_console 80 with simple_console.capture() as capture: File ~/Documents/git/my_forks/ibis/ibis/expr/types/core.py:102, in Expr._repr(self) 99 def _repr(self) -> str: 100 from ibis.expr.format import pretty --> 102 return pretty(self) File ~/Documents/git/my_forks/ibis/ibis/expr/format.py:170, in pretty(node) 167 result = f"r{next(refcnt)}" 168 return Rendered(result) --> 170 results = node.map(mapper) 172 out = [] 173 for table, rendered in tables.items(): File ~/Documents/git/my_forks/ibis/ibis/common/graph.py:162, in Node.map(self, fn, filter) 156 for node in Graph.from_bfs(self, filter=filter).toposort(): 157 # minor optimization to directly recurse into the children 158 kwargs = { 159 k: _recursive_lookup(v, results) 160 for k, v in zip(node.__argnames__, node.__args__) 161 } --> 162 results[node] = fn(node, results, **kwargs) 163 return results File ~/Documents/git/my_forks/ibis/ibis/expr/format.py:164, in pretty..mapper(op, _, **kwargs) 163 def mapper(op, _, **kwargs): --> 164 result = fmt(op, **kwargs) 165 if isinstance(op, ops.Relation): 166 tables[op] = result File ~/mambaforge/envs/ibis-dev/lib/python3.11/functools.py:909, in singledispatch..wrapper(*args, **kw) 905 if not args: 906 raise TypeError(f'{funcname} requires at least ' 907 '1 positional argument') --> 909 return dispatch(args[0].__class__)(*args, **kw) TypeError: _binary() got an unexpected keyword argument 'distance' ```
### What version of ibis are you using? Ibis Dev ### What backend(s) are you using, if any? NA ### Relevant log output _No response_ ### Code of Conduct - [X] I agree to follow this project's Code of Conduct
gforsyth commented 1 year ago

Fix for the second error in #7428

kszucs commented 1 year ago

Mode details about the first issue:

https://github.com/ibis-project/ibis/blob/master/ibis/expr/operations/core.py#L78-L85

Currently we try to instantiate a concrete type based on the typehint first, if that fails we infer the datatype from the value. Then we construct (normalize) the literal according to the either requested or the calculated datatype.

This has been working nicely so far, imagine the following annotations and the expected output datatypes:

Value[dt.Int64](1) => dt.int64 is in instance of Int64
Value[dt.Int64](128) => dt.int64 is in instance of Int64
Value[dt.Integer](1) => dt.int8 is in instance of Integer
Value[dt.Integer](128) => dt.int16 is in instance of Integer

though the current issue is that we expect a different datatype family than the inferred one is:

Value[dt.Floating](1) => could be dt.Float64 but we infer dt.Int8

One possible fix could be to allow DataType.from_typehint(dt.Floating) to return with a concrete type, e.g. the largest of the type family: dt.float64, but that we would similarly expect that Value[dt.Integer] shall be coerced to Value[dt.Int64] instead of the smallest possible type capable of representing the given value.

cpcloud commented 1 year ago

We should probably never infer float16 for anything.

Hardly any backends support it.

I don't like the smallest integer behavior either. We should probably work towards a system where the types are refined as operations are done, since neither int8 nor int64 may be the right type to infer in x + 1.

kszucs commented 1 year ago

We could have special cases for numeric types where we aim to represent the literal with the smallest datatype:

try:
    dtype = dt.DataType.from_typehint(T)
except TypeError:
    if T is dt.Floating:
        dtype = dt.float64
    elif T is dt.Integer:
        dtype = dt.infer(int(value))
    else:
        dtype = dt.infer(value)