jamespfennell / transiter

Web service for transit data
https://demo.transiter.dev
MIT License
55 stars 6 forks source link

Allow the maxiumum number of returned stops to be configured #102

Closed cedarbaum closed 1 year ago

cedarbaum commented 1 year ago

This change removes the 100 stop limit when using a DISTANCE search with the /stops endpoint. While this may not be performant for large distances, searches for the us-ny-subway of approximately 2/3 miles in Manhattan exceeds this limit, and are reasonably fast.

The main concern I have with this change that it could, in theory, be used adversarially. For example, if the /stops endpoint is directly exposed, large searches could degrade the service. This could be mitigated by either:

  1. Allowing the server to be configured with a hard limit (defaulting to 100) and then letting server operators decide what they want to allow
  2. Keeping a hard coded limit but increasing it
jamespfennell commented 1 year ago

I'm definitely concerned about the adversarial angle. To test this out, I installed the MTA buses system you added and then did a geographic search for all stops:

go install . buses https://raw.githubusercontent.com/jamespfennell/transiter/master/systems/us-ny-nycbus.yaml
time curl "localhost:8080/systems/buses/stops?search_mode=DISTANCE&max_distance=1000000&latitude=40.689484&longitude=-73.994302" > /dev/null

This takes 3 seconds and generates 130 megabytes of data! (Honestly, sort of surprised it's this fast!) Using this endpoint it would be pretty easy to DDOS Tranister I feel. Also FWIW these kinds of requests are hard to cache because they contain a GPS payload.

So I feel the best solution would be (1), in which the limit of 100 is server configurable? Setting the limit -1, say, would disable the limit entirely. We could keep the default of 100, but this would allow operators to use their own judgement, as you say. What do you think?

In terms of implementation, this would require adding a flag to the server subcommand in transiter.go, wiring the value to internal/server/server.go, and then maybe storing the flag value in the internal/public/public.go:Server struct. The stop handler could then read the flag value from there.

cedarbaum commented 1 year ago

Thanks, this approach makes sense! I sent an update with the new option. I didn't end up implementing the -1 => unlimited behavior, since I think it complicates the SQL a bit too much to be worth it (i.e. having to remove the LIMIT clauses conditionally) and the user can effectively get the same behavior by specifying a large number. But maybe there is a slick way to do this in sqlc?

EDIT: to be clear, I am totally OK implementing the -1 behavior, it's just I wasn't sure if there was an easier way than duplicating the existing queries without limit clauses and then conditionally calling the correct one.

jamespfennell commented 1 year ago

Thank you for the updates!! I see the CI failed, but it seems unrelated to this PR so going to merge this now.