Closed ritchie46 closed 2 years ago
Can I take up this issue and work on it if you are not working on it .
Be my guest! :)
Just wanted to highlight that sorted-joins do not require exact matches.
There is a large benefit for timeseries analysis here. It is often useful to join two dataframes on timestamp (non-exact) matches
A simple example would be to see which person would get on what bus from the two data-sets provided below. Here you want to join on timestamp (non-exact) and bus-stop to find out which passenger boarded on what bus.
Bus stops
Timestamp | bus | stop |
---|---|---|
14:00 | Bus A | Stop 1 |
14:10 | Bus B | Stop 2 |
14:15 | Bus A | Stop 2 |
14 :20 | Bus A | Stop 3 |
Passenger | Timestamp | Passenger | stop |
---|---|---|---|
14:02 | John | Stop 3 | |
14:09 | Brad | Stop 2 |
I can understand this can be useful, but has this got a name? This isn't exactly a join? Feels like a bucket search or something like that.
out of scope.
Such an operation is often named as_of_join.
Btw. as_of_join is implemented. :)
Sort merge join can be faster than hash join when Series are sorted and maybe when they are not.