sustainable server setup: collaborate on hosting

derhuerst commented 4 years ago

I run public hafas-rest-api-based APIs at transport.rest, because I want public transport data to be (more) easily accessible.

But keeping things running on servers isn't my passion unfortunately, and I'm not very good at it either. Let's collaborate on this to get a sustainable, reliable and maintainable server setup!

current situation

For now, I've created a single repo for each wrapped HAFAS endpoint, each using hafas-rest-api and *-hafas/hafas-client.

This way, I can easily

version the APIs independently (hence the <version>.<endpoint>.transport.rest domain scheme),
add custom docs per API/endpoint,
point people to right place when they wanted to host their own local API.

Right now, all of these run on one VPS, behind a Caddy v1 reverse proxy / load balancer. As discussed in vbb-rest#29, I want others to run additional "mirror instances". This enables zero-downtime updates, and increases availability when an instance's IP got blocked by HAFAS.

But with this setup, two problems came up:

The individual instances "diverge" (not all use up-to-date versions of hafas-rest-api, *-hafas, & hafas-client), giving inconsistent responses.
I don't want to spend money on 2 VPS just for transport.rest.

But I don't have the motivation & time to update all of these repos manually and write custom docs for each. Outdated, lacking & confusing documentation has confused man people already.

planned improvements

Moving forward, I would like to keep

the domain scheme (<instance>.<version>.<endpoint>.transport.rest),
the load-balancing setup,
individual domains per API for now (IMO it makes development & debugging a lot less confusing),
the individual APIs as separate processes (for easier maintenance, logging & access statistics),

I would like to

have at least 1 fallback instance for each API, on a different machine.
have a CI-based deployment, to keep all instances up-to-date.
move all hafas-rest-api-based APIs into one repo for transport.rest, simplifying the setup.

call for help

Do you think we should approach this differently?

Who can run a set of instances (one per endpoint) on their VPS? Each instance needs roughly 100-200mb of RAM & almost no CPU. We would deploy to the VPS from a CI.

Can you recommend (non-over-engineered) tools for deploying Node apps to VPSes?

derhuerst commented 4 years ago

I will mention a few people here, just ignore this thread if you're not interested. @juliuste @rejas @jonathan-reisdorf @jrtberlin @deg0nz @k-nut @mbariola @polarblau @Mike-Zimmermann-mg @deiga @ialokim @n0emis d3d9

derhuerst commented 4 years ago

previous ops-related discussions, in decreasing order of relevance:

mbariola commented 4 years ago

Hi, I would suggest something different if you think it is feasible. Have you considered reaching out to the same organizations that should provide your awesome service in the first place? e.g. BVG?

I'm not good at devops either, the scale of my current projects is usually satisfied by a rinky dinky RPi4 running clients

derhuerst commented 4 years ago

Hi, I would suggest something different if you think it is feasible. Have you considered reaching out to the same organizations that should provide your awesome service in the first place? e.g. BVG?

I have been in touch with VBB every now and then, and they seem to tolerate my activities by now, but don't seem to appreciate what I'm doing though. Plus, if they ran this service, AFAIK they'd have to negiotiate a different contract with the company that runs their crappy APIs. 🙄

I'm trying to get the open data/API ball rolling with these APIs, creating precedent, so that eventually, they will run this stuff. I will try to get in touch with them once more, though.

mbariola commented 4 years ago

It's a pity. If anything significant changes on my side I'll ping you.

rejas commented 4 years ago

hi @derhuerst cant say much about the domain-scheme / endpoints / api issues, I am more of a javascript frontend guy that uses stuff like your libraries :-D I do however have a uberspace account with these resource limits: https://manual.uberspace.de/basics-resources.html Not sure if that would be sufficient?

derhuerst commented 4 years ago

I do however have a uberspace account with these resource limits: https://manual.uberspace.de/basics-resources.html Not sure if that would be sufficient?

That would be sufficient, but I wouldn't trust credentials for your Uberspace account to the CI of a collective of people. 😉

dancesWithCycles commented 3 years ago

Hi @derhuerst , I have running my own root server at manitu in Saarland. I suppose I have available resources to run an instance. Do you know a good tool to see available resources on Debian 10? How do you do your resource monitoring?

Please feel free to reach out to me if you are still interested to spread your instances. Cheers!

derhuerst commented 3 years ago

I have running my own root server at manitu in Saarland.

Sounds great! I still have demand for servers to run the *.transport.rest APIs on.

Please feel free to reach out to me if you are still interested to spread your instances.

Will do!

ForsakenHarmony commented 3 years ago

Can you recommend (non-over-engineered) tools for deploying Node apps to VPSes?

docker-compose?

Jesse-jApps commented 2 years ago

Hi, thanks for providing such a service! Did you ever consider running the api on cloud functions or aws lambda? They both have reasonable free tiers. But if you already get millions of request every month, it'll become very expensive very quickly.

dancesWithCycles commented 2 years ago

Hi everyone, Honestly, I did not follow this discussion in a intense manner. However, I still have idle hosting capacity to offer. A kick-off discussion would be required from my site to get an understanding of the deployment processes.

derhuerst commented 2 years ago

Honestly, I did not follow this discussion in a intense manner. However, I still have idle hosting capacity to offer. A kick-off discussion would be required from my site to get an understanding of the deployment processes.

Sure, let's discuss this! Shall I reach out to you so that we can arrange a meeting?

konhi commented 2 years ago

Did you ever consider running the api on cloud functions or aws lambda?

@Jesse-jApps https://poland.transport.rest runs on Cloudflare Workers, which is an excellent solution for middlewares/API wrappers. Free plan gives 100k requests/day, everyone can deploy their own instance. There's also a very humble paid plan. I also heard that other cloud providers may have problems with cold starts, which isn't a thing with Workers.

tripsli commented 2 years ago

As far as I understand the situation (reading through prior issues), there are two distinct problems:

VBB (and potentially other transport agencies) keep blocking the IPs of *.transport.rest instances
There is a need for an affordable hosting solution that allows for CI-based deployment and always runs the latest version

Rather than having multiple servers managed by various individuals running behind a load-balancer, I'd suggest having a single server processing all requests and using a proxying solution to avoid IP bans.

Regarding a hosting solution: Have you considered applying to fosshost or any other organization that provides free hosting for open-source projects? At first glance, the *.transport.rest endpoints meet all of the eligibility criteria and fosshost seems like a good fit, seeing how this project aims to promote open-source and open-data.

Running a single instance rather than a network of instances behind a load balancer would allow you to deploy from a CI pipeline and ensure that the server is always running the latest version.

In order to avoid getting IP banned, you could employ a proxying solution that regularly rotates the IP through which requests are made. I don't think this would add a lot of complexity, load or latency (at least compared to running multiple instances behind a load balancer). There are lots of proxy providers out there, varying wildly in terms of price and reliability. There are also different kinds of proxies (shared with other users or dedicated, with datacenter IPs or residential IPs, rotating with every request or non-rotating ones). For this use case, a small pool of shared datacenter IPs (the cheapest option) that's rotated periodically should suffice in my opinion.

The two cheapest ones I'm aware of (without endorsing any):

1. Webshare

Probably the cheapest proxying solution out there. A pool of 100 proxies with 250GB monthly bandwidth starts at $2.99. Rotating all 100 proxies each month costs another $1.50.

2. IP Royal

5 dedicated proxies with datacenter IPs start at $7.50 with unlimited bandwidth.

Running a single server (preferably sponsored by a reputable organization) and routing requests to VBB and other providers through a network of proxies would make for a very resilient solution that doesn't require much maintenance (as proxies could be replaced automatically and the server should "just work" once properly set up with a CI pipeline).

derhuerst commented 2 years ago

Rather than having multiple servers managed by various individuals running behind a load-balancer, I'd suggest having a single server processing all requests and using a proxying solution to avoid IP bans.

This is definitely the more flexible setup. I've built a custom proxying setup a while ago (see also my request to host these proxies), but it's currently not in use.

The HTTP-based load balancing has a significant advantage though: People can use their instance for themselves, and offer it as a fallback just in case my instance is down. Several people have contacted me because they wanted to prevent traffic on my instance and therefore hosted their own anyways, and most of them offered being the fallback.

Luckily, the *IP blocks seem to have become rare in the past year (?), at least for the `.transport.restinstances**, so I'll close this issue for now. Also, in the past years, I have changed to a less extreme money-saving lifestyle, so I am more willing to afford a separate VPS just for*.transport.rest`. Thank you to everyone who offered support!

@tripsli Thanks for bringing the IP proxying services to my attention, I will consider using them when *.transport.rest is blocked again.

derhuerst commented 2 years ago

For my other project – the quest to create GTFS-Realtime feeds by polling HAFAS endpoints (repos, Twitter post) –, I need to do many many more requests, so using these proxy providers is not an option AFAICT. If someone has pool of IPv6 proxies at their disposal, please get in touch with me.

public-transport / transport.rest