Open phildias opened 8 months ago
This is an enhancement that's come up before and is probably worth implementing. I expect it shouldn't be too difficult to add this feature.
Should this be limited to the keys
of the events collection being projected onto, or should this be a general matching parameter that can match any/any number of arbitrary fields between the events collection and the projected data frame, similar to pandas' merge
feature?
An alternative workaround that I use involves setting nearest=False
to get all projection hits within the buffer
and then performing the filtering after the projection by dropping records where, e.g., Route_ID != Known_Route_ID
. I expect this should produce an equivalent result to your approach but may be faster. This is probably approximately the approach I'd suggest for bringing this feature into the project
method.
I really like the nearest=False
solution! Much cleaner than the cumbersome workaround.
Regarding the question of sticking to just RouteID or multiple keys, I'm a bit conflicted... I think the rest of linref is built around just using the keys
as the route identifier, and I can't think of all that many cases where we would need to use any other columns to narrow down the geographic search.
Having said so, I do see your point about how Pandas in general does joins & merges... Hmmm...
I think my vote would still go towards only using the RouteID keys to narrow the search. I feel doing so will keep things more aligned with the rest of how the library works.
There have been several times where I've needed to
project
a GeoDataFrame with points onto a reference EventsCollection where I also happen to know which Route ID each point belongs to.Currently, I need to make a workaround of looping over all the unique Route IDs and filtering both sets (the input GeoDataFrame and the reference EventsCollection), run the projection and then concatenate all the results.
However, it probably wouldn't be too difficult for us to change the
project
method's API to account for this. Specifically, we could add akeys
parameter (either one column or set of columns) and the spatial search process gets filtered to only consider cases in the reference EventsCollection where the Route ID match.Here's an example and my current workaround:
As I mention in the comment above, note that Point 3 is supposed to be on Route B, but it gets joined to Route A due to the fact that the spatial join from the
project
method was done without taking into consideration its known Route ID.Here is how I typically get around this limitation:
Do you think we can modify the
project
method to allow for known Route IDs?