jamespfennell / transiter

Web service for transit data
https://demo.transiter.dev
MIT License
62 stars 7 forks source link

Add Gzip support for public HTTP endpoint #101

Closed cedarbaum closed 1 year ago

cedarbaum commented 1 year ago

Allow Gzip compression on the public HTTP endpoint when client accepts it. This significantly reduces data transfer - for example, calling /stops for the us-ny-buses system reduces transfer size from 1.67 MB to less than 40 kB.

cedarbaum commented 1 year ago

For context, I am running Transiter without a traditional reverse proxy that can perform compression. After giving this some more thought, I suppose it might not be ideal to have this here as "always on", as this moves the work from reverse proxies to the server itself (which can be CPU intensive). Perhaps an option that defaults to false is better, but will wait for your thoughts as well before making the change.

jamespfennell commented 1 year ago

Yeah I think that outgoing traffic to the internet should be gzipped, and the numbers you have are great proof of that. However I'm not sure that Transiter should be doing it. The expectation is that Transiter is always be behind a reverse proxy. This expectation exists partially because nowadays setting up a reverse proxy is pretty easy; for example, this is the Caddy reverse proxy configuration for demo.transiter.dev:

demo.transiter.dev {
    encode gzip
    reverse_proxy 127.0.0.1:8010 {
        header_up X-Transiter-Host "https://demo.transiter.dev"
    }
}

There are other things the reverse proxy does too, like preventing against various kinds of DDOS attacks.

Is it possible to use a reverse proxy in your case? Or could you describe more your use case?

cedarbaum commented 1 year ago

I am currently running Transiter in an AWS VPC and expose it via API Gateway (which itself uses a VPC private link to call Transiter). I am trying to get something like CloudFront to sit in front of everything (which would handle compression), but am still figuring out the best way to do this. Longer term, this likely needs to be re-architected a bit.

EDIT: for reference, this is more or less my stack: https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-private-integration.html

Having Transiter do the compression is convenient in the current setup because it allows me to avoid standing up another endpoint just for the sake of compression (since network isolation, port mapping, etc. is already handled). I do agree this isn't great in general, but would you be OK with adding a --enable-http-compression server option?

I totally understand if this is something you'd rather not add and can close this PR if so. I am already running a forked version of Transiter with this enabled for now to mitigate the issue.

"Fun" fact: I discovered this issue when some of the /stops requests I was making exceeded 10MB, which API gateway throws 413s for 😅.

cedarbaum commented 1 year ago

I ended up moving my hosting to DigitalOcean's App Platform, which does provide compression (had to finally declare stack bankruptcy on the above solution 🙃). Apologies for the back and forth on this - I am still happy to add the option mentioned above and merge if you think it would potentially be useful to others.

jamespfennell commented 1 year ago

Sorry if I nudged you to change your stack!! Good that it worked out though. I do wonder if Amazon API gateway supports compression...?

Overall I would be inclined not to include this PR, following again the general philosophy that Transiter should not be doing things that are the responsibility of the reverse proxy. In this case specifically, supporting this adds another direct dependency and it's always nice to minimize those. Of course if you were still blocked on this that would be a different story.

This PR does make me wonder if the Docker compose file in this root of the repo should include a reverse proxy (maybe just Caddy)? Would be nice for documentation purposes.

cedarbaum commented 1 year ago

Sorry if I nudged you to change your stack!! Good that it worked out though. I do wonder if Amazon API gateway supports compression...?

No worries! It was long overdue and my new setup is much simpler and I think will end up being cheaper as well.

I believe API gateway does support compression if setup as a REST API, but I was using it as an HTTP endpoint, which has much more limited functionality out of the box. I am sure there is a way to get it working, but I couldn't figure it out unfortunately.

Overall I would be inclined not to include this PR, following again the general philosophy that Transiter should not be doing things that are the responsibility of the reverse proxy. In this case specifically, supporting this adds another direct dependency and it's always nice to minimize those. Of course if you were still blocked on this that would be a different story.

Makes sense, thanks for the consideration and feedback!