Open ardieorden opened 5 years ago
@ardieorden if we have OR
and AND
in our SQL statements, how would you imagine its usage?
Right now, on
refers solely to the fclass. In this context, I don't think we need to worry about the AND
statement since the fclass is mutually exclusive.
If you make a query on Geofabrik data with a WHERE
clause that says
WHERE fclass = 'primary' AND fclass = 'secondary'
then you wouldn't get any results because a road that has a primary
fclass can't also have a secondary
fclass at the same time.
But if on
can refer to other columns aside from fclass and if it can accept logical and comparison operators, then I would think it would be nice to be able to do something like this:
df = DistanceToNearest(
on= ["fclass", "is", "not", "primary", "or", "fclass", "is", "secondary", "and", " distance", "less than", "1000"]
source_table="geospatial.ph_osm.gis_osm_roads_free_1",
feature_name="dist_primary_or_secondary")
WHERE fclass != 'primary' OR fclass = 'secondary' AND distance < 1000
The logical and comparison operators can be reserved keywords for the on
argument. Anything that's not a reserved keyword is assumed to be a field name.
I'm not sure how usable this feature will be for our current feature engineering work and it seems quite complicated. I'm ok with just implementing the OR logical operator for the on
argument for now. What do you think @ljvmiranda921 ?
Parameter on
can now use columns aside from fclass
via the {column}:{filter}
format. So I was thinking of something like:
bridge:T AND fclass:primary
Your idea of passing iterables is fine. To incorporate the operators I think we can follow something similar to SQLAlchemy, an or(*args)
and and(*args)
method: https://stackoverflow.com/questions/7942547/using-or-in-sqlalchemy.
import geomancer.filters as f
DistanceToNearest(
on = f.and("bridge:T", "primary"), # default is fclass if no filter
**kwargs
)
Can also do chaining:
f.or("secondary", f.and("bridge:T", "primary"))
Cool! I like your suggestion much better.
@pberba Here's the issue.
Right now, a user can only input one
on
parameter to any of theSpells
.The
on
parameter is equivalent to aWHERE
clause in SQL.There might be cases wherein multiple
on
parameters are needed for feature engineering and it'd be useful to have that option available out of the box.