Closed cedarbaum closed 1 year ago
Hi Sam - awesome to see a PR on the new Go code for this feature! I’m currently traveling with no computer access, and so will engage more deeply when I get back home next week if that’s okay?
My high level thoughts are, first, that your idea of having the geographic search integrated in the GET endpoint is the right way to go. I think fewer endpoints = simpler API.
Second, the concern I have with this approach, which is why I originally created a different endpoint, is to do with performance. If the SQL query has a WHERE clause involving the latitude and longitude then a full table scan will be needed to resolve the query. However I think from looking at the PR that you have already factored this in and you use the geographic SQL query only when geographic search is requested. This is awesome.
So basically the endpoint has two modes depending on whether geographic search is requested (using your new FilterByDistance parameter). I’m thinking perhaps we should more explicitly expose this dichotomy in the API? For example, if a user requests a geographic search maybe we should be explicit that the results will be ordered by distance, rather than stop ID? (Or should we not order them by distance?) And perhaps we should disallow pagination, or support a different kind of pagination based on distance?
On Mon, Jan 16, 2023, at 4:43 PM, Sam Cedarbaum wrote:
This is a proposed change to allow fetching stops by geolocation. It is similar to the API that existed in the Python version, with some differences:
Does not use a separate
POST
endpointDoes not return
distance
with each stop.
- This was done to simplify the generated type structure. Adding a distance to the return type forces a new type to be reconciled with the model's existing
Stop
type. If consumers needdistance
, it can easily be recalculated via the returnedLatitude
andLongitude
fields onStop
. If this proposed API looks good, there are a couple more tasks to complete:Add tests
Add documentation
You can view, comment on, or merge this pull request online at:
https://github.com/jamespfennell/transiter/pull/90
Commit Summary
- 0b0f4a8 https://github.com/jamespfennell/transiter/pull/90/commits/0b0f4a814a0498f837df03675ffd737ac184b5ff Modify list stops query to allow filter/sort by distance File Changes
(21 files https://github.com/jamespfennell/transiter/pull/90/files)
M api/public.proto https://github.com/jamespfennell/transiter/pull/90/files#diff-9537a654ae1c8bd8901bbc79f308e01015420879ee7f7e0ff7245350c6026ce7 (21)
M db/queries/query.sql https://github.com/jamespfennell/transiter/pull/90/files#diff-b946af91d44ffaecac604834584f5b0d939ea53449ad5113a980e52d459b8b34 (54)
M internal/gen/api/admin.pb.go https://github.com/jamespfennell/transiter/pull/90/files#diff-967531640e6324e5878e6886c71261d3c5b625dadb7b5dd354a48608a5ccb912 (6)
M internal/gen/api/public.pb.go https://github.com/jamespfennell/transiter/pull/90/files#diff-c6a2b5aae2f6b6214721aa6b984cc62496d055ac8484dae0d81729069ca98419 (2443)
M internal/gen/api/public_grpc.pb.go https://github.com/jamespfennell/transiter/pull/90/files#diff-5389efcbf00b0a70cdedf135d9f2ee9173483d581490966f5fc4dee714119471 (4)
M internal/gen/db/agency_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-7b55d6410efdec68514bc88dc81485d4e53945270fd6b88ce68762c86d4cd710 (2)
M internal/gen/db/alert_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-e05e3ac444622d076483070f69e9e956856c24a652ce3df70cf92b19a780a16a (2)
M internal/gen/db/db.go https://github.com/jamespfennell/transiter/pull/90/files#diff-7ec4a40d98d6815d9518b4391038fe73dd3a54da3c06f1c5b264528c5b46b48f (2)
M internal/gen/db/feed_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-b3016aab1a148da2f1ec59b41d73baaa6d086fe69de3f744311e09a254001673 (2)
M internal/gen/db/models.go https://github.com/jamespfennell/transiter/pull/90/files#diff-022c0423b7bfc1deae3b7228f1c4ef506f290efc99f254d793987ba987445374 (2)
M internal/gen/db/querier.go https://github.com/jamespfennell/transiter/pull/90/files#diff-4562ba401718c24b1152137e6ddeced5c6a232099a0c006229d746ede48926f3 (3)
M internal/gen/db/query.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-14a281b87a7356206f57ec1d5c644cd88aa1f890eb53190068dbe1235721db0e (114)
M internal/gen/db/route_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-c7dc8b3bafa714ec0e16857c521cf70762364d03ce67637b97c35373e8f2d1a0 (2)
M internal/gen/db/servicemap_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-2087dadbbf1991d6efce47ddcd92f858588f49433f327b713e0de582b9dee82a (2)
M internal/gen/db/stop_headsign_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-5231b427e227a5f8c926613765935c32b1547c151b6f9dea6b81597e25b0354b (2)
M internal/gen/db/stop_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-94be8093c0a5a2d4925f788332ce79df4a64cd018aca60188eb59c7570861904 (2)
M internal/gen/db/system_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-1c12e42c7d624412c8b063be5fa1727e02e25bfecddf3235ff50876f07ef0f7c (2)
M internal/gen/db/transfer_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-7c994d9c47cc2d00768b864937020dbdb9cf2cda3d4f61d476a752e4d738f62f (2)
M internal/gen/db/trip_queries.sql.go https://github.com/jamespfennell/transiter/pull/90/files#diff-cf6d1b44cc60c13dbc2366e794e2b2f295c081d5a4388b94360d5d0349687fbc (2)
M internal/public/endpoints/stop.go https://github.com/jamespfennell/transiter/pull/90/files#diff-39244603893a09ee38d567de3497d5b3b5b0d20d6c779c36ecb03275b61dd00a (36)
M systems/us-ny-subway.yaml https://github.com/jamespfennell/transiter/pull/90/files#diff-63fb887010895512a7970b1363377fc73af7fcfff5aa4a4dfaa742c7bc204b3c (18) Patch Links:
— Reply to this email directly, view it on GitHub https://github.com/jamespfennell/transiter/pull/90, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4T7SST4IL3RVBD2L45PO3WSW6GTANCNFSM6AAAAAAT5FJ42U. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for looking and for the initial feedback! I've updated the PR based on the suggestions:
sort_mode
to be specified when using the geographical version of list stops. This can be ID
(the default) or DISTANCE
.max_distance
is sufficient if one needs to limit results. Further, the performance for even large distances isn't too bad (see below).No rush to review this; hope you enjoy your vacation!
# 1KM
❯ curltime -X GET "localhost:8080/systems/us-ny-subway/stops?max_distance=1&latitude=$LAT&longitude=$LON&filter_by_distance=true"
time_namelookup: 0.002776s
time_connect: 0.002998s
time_appconnect: 0.000000s
time_pretransfer: 0.003015s
time_redirect: 0.000000s
time_starttransfer: 0.065140s
----------
time_total: 0.065253s
# 5KM
❯ curltime -X GET "localhost:8080/systems/us-ny-subway/stops?max_distance=5&latitude=$LAT&longitude=$LON&filter_by_distance=true"
time_namelookup: 0.003572s
time_connect: 0.003805s
time_appconnect: 0.000000s
time_pretransfer: 0.003832s
time_redirect: 0.000000s
time_starttransfer: 0.300522s
----------
time_total: 0.306107s
# 500KM
❯ curltime -X GET "localhost:8080/systems/us-ny-subway/stops?max_distance=500&latitude=$LAT&longitude=$LON&filter_by_distance=true"
time_namelookup: 0.002956s
time_connect: 0.003194s
time_appconnect: 0.000000s
time_pretransfer: 0.003214s
time_redirect: 0.000000s
time_starttransfer: 0.781933s
----------
time_total: 0.799326s
Just getting around to looking now!
From the PR, it seems the list stops endpoint would have 3 "modes":
ListStopsInSystemGeo_ByDistance
SQL query).ListStopsInSystemGeo_ById
SQL query).I wonder should we just not include the last use case (3)? Worst case scenario, a caller could just use (2) and then sort on their end which is easy because there is no pagination for the geo queries. Supporting use case (3) makes the API a little trickier I think, and also requires an additional SQL query and code to maintain.
If we only had (1) and (2), then we could change your SORT_MODE
argument to be a SEARCH_MODE
argument (and remove the filter_by_distance
argument). As in the current PR, there would be two values (ID
and DISTANCE
). In the documentation we could then clearly specify which fields are relevant to each mode. For example, first_id
would only be used in ID
mode and max_distance
would only be used in DISTANCE
mode. What do you think?
Thank you again for working on this feature!
I think this makes sense and I've updated the PR with the proposed changes. Please let me know if this looks OK and also what your thoughts on testing are (I didn't see any existing tests for the /stops
endpoint yet in the go version). Thanks!
Also, regarding the CI failures, I believe this is due to me using a different version of sqlc
than the CI. I was having trouble matching the version exactly, since homebrew only seems to have v1.16.0. Not sure what the best fix is here, though I am certainly open to trying to match the version directly if that's the easiest way forward.
I think this looks really awesome! Also the implementation looks really clean and simple which is great. Thank you so much for contributing it!
Regarding testing: in the Python version I had a bunch of unit tests for all of the endpoints and all of the SQL queries. But I didn't migrate them to Go, partially out of laziness and partially because with Go's type system you get a lot of coverage for free IMO. (The only exception is that I've been trying to add tests for the really complex SQL queries.) Adding unit tests for this PR would perhaps be nice, but I don't think it's so important?
What would be really nice, though, is an end to end test for the geographic search feature. I'm not sure if you've seen them, but there are E2E tests in the tests/endtoend
directory. The list stops endpoint is tested in the test_install_system__stops
test in test_installsystem.py
. The E2E tests use a transit system with some synthetic GTFS data I created in tests/endtoend/data/gtfsstatic/stops.txt
. Right now all the stops have silly lat/lon coordinates, but I think we could put in reasonable coordinates, have a test that searches based on some center point, and then verify we get the stops in the right order back based on some manual calculations? Or even multiple center points. This would provide coverage of the API and the SQL query. If you want to give it a shot you could, but I would also be totally happy to write the E2E test if you don't fancy it?
Lastly regarding setup, I've also encountered similar issues when switching between computers. Perhaps the easiest thing would be to bump the version of sqlc in the go.mod
file to 1.16? Then the CI env will match yours.
(I just updated the E2E testing docs in case you're interested in trying to run the tests...the docs may still be confusing though!)
Thanks for the advice with the testing! I was able to add some E2E coverage to the existing test_install_system__stops
test.
Also, as suggested, I updated sqlc
to match my local setup. I also needed to update buf
in the CI Dockerfiles, and now everything seems to match! The CI now fails at the "Login to DockerHub" step, which I believe is expected since contributors cannot access secrets with builds. I think this could be fixed by creating another workflow just for PRs that makes no attempt to publish to Docker Hub (or just conditionally not trying to login on PR workflows).
Please let me know if there is any further feedback.
Awesome, the new tests look great! Thanks again for contributing this feature. I hope the back and forth wasn't too burdensome. When it comes to changes to the API I think it's good to spend some time polishing it because it's harder to change after the fact because of backwards compatibility. I think the result here is really great.
I think I've fixed the CI for PRs on main, but am just going to merge this now.
Awesome, thanks so much for the discussion and feedback! I think this issue can probably be closed now: https://github.com/jamespfennell/transiter/issues/85
This is a proposed change to allow fetching stops by geolocation. It is similar to the API that existed in the Python version, with some differences:
POST
endpointdistance
with each stop.Stop
type. If consumers needdistance
, it can easily be recalculated via the returnedLatitude
andLongitude
fields onStop
.If this proposed API looks good, there are a couple more tasks to complete: