Open knaaptime opened 4 months ago
Yeah, this is tricky... I don't mind functions having multiple inputs using something like libpysal.graph._utils._validate_geometry_inputs()
or something--I think that makes a lot of sense.
But, output type should be consistent imho... what about an "as_geom=None
" option that set to True
if the input is a geometry array/GeoSeries?
Yes please. It feels really weird when teaching pointpats that it, out of nowhere, expects arrays. The same applies to mgwr and spreg.
I don't like the from_gdf
API very much. I'd prefer functions to be smart about the input and consume anything, be it array of coords or any array-like of shapely points.
Fully agree with @ljwolf that the output should be consistent, so I don't really understand the proposal there, which makes it controllable but inconsistent (dependent on the input type). as_geometry
keyword or alike could be fine but I would not try to make some smart decisions about the return type. Make it a simple bool, initially defaulting to coords with deprecation and settling with Point later.
Another question is whether we want to return Point or GeoSeries, where I tend to prefer the former.
😅 I meant "output type should be consistent with input type"... I think if we have a function that accepts one of many possible input types, constructs a new object, and returns it, that output type should match the input type.
I think it's generally ok to have options that return the output type of a function, hence making the output type "inconsistent" across runs. In numpy/scipy code, this is often a return_
argument, like numpy.unique(array, return_counts=True)
. If we were to implement it as return_geometry=False
, then we're in a relatively odd situation where the output type and input type are forced out of sync while we're deprecating, and it's not quite clear that we want to fully deprecate sending arrays to get back an array?
Hence, I thought return_geometry=None
allows us to do the right thing more often than not, unless forced to do something specific. We're only supporting two types of input, and we have the code to do this in libpysal.graph
already.
Maybe I have just been using R too much...
I think this is an important discussion, that perhaps should move to lib so that we ensure consistency across packages - if that makes sense? That would help with the surprises @martinfleis hits in pointpats - those surprises reflect the history of the packages - much of the early pysal packages were born pre pandas/geopandas, so it was numpy/numeric centric.
I'm also old enough to remember debates about whether things that we take for granted today (pandas, matplotlib, geopandas) where going to become standards or not. So the philosophy has been conservative (which is probably an understatement when it comes to changing apis ;-.> )
Hence, I thought return_geometry=None allows us to do the right thing more often than not, unless forced to do something specific. We're only supporting two types of input, and we have the code to do this in libpysal.graph already.
i find the return_
pattern pretty familiar, so i think i'd second Levi's proposal. In that case, we basically pass of to the prep function graph uses
Another question is whether we want to return Point or GeoSeries, where I tend to prefer the former.
the Point is probably more sensible (though the first thing im likely to do is wrap it in a geoseries and explore.. :) )
@ljwolf can you think of any example of a reduction (like weighted_mean_center
would be), that returns different objects depending on the input? I know R is doing that but in Python, I can think only of numpy ufuncs which return a scalar in case of a reduction and always the same type of scalar, while for array-like output it usually preserves the type. I think that the logic return 2x1 array of coords if array is an input, otherwise return shapely.Point very strange an unintuitive within our ecosystem.
then we're in a relatively odd situation where the output type and input type are forced out of sync while we're deprecating
I am not sure where this comes from but given this is not a ufunc I don't see this an issue at all. Or to be more precise, I don't see the reason why input and output types shall be in sync for stuff like centrography. If I ignore what happens in R, I don't see a precedent, though that may just be my ignorance.
I think we should pick this discussion during the next PySAL dev call, to be more efficient.
strange and unintuitive within our ecosystem
I don't think it's that foreign, and we do something similar already in pointpats.geometry. But, fine to chat about this at the developer meeting. I would certainly like to avoid duplicating methods with from_gdf
, and hope for a solution that avoids emitting numpy when shapely is provided.
I don't see a reason why input and output types shall be in sync for stuff like centrography.
I think it makes sense, if provided shapely-based geometries, to emit shapely-based geometries, especially if the change is motivated by, as @knaaptime puts it:
might be nice to have something that consumes geodataframes/series and outputs the same.
pointpats predates geopandas and was originally designed around (n,2) arrays of coordinates. It hasnt been updated much over time like the rest of the pysal stack, but today it's much more common to work in geodataframes rather than numpy arrays (even though you can reformat into the structure pointpats expects fairly easily). It might be nice to have something that consumes geodataframes/series and outputs the same.
e.g. instead of (in addition to?) the current
weighted_mean_center
which takes and returns arrays, i end up using something likewould that be of general use? If so,
_from_gdf
set of functions or something?