planetary-social / ansible-scripts

Ansible automation scripts used at Planetary
MIT License
2 stars 3 forks source link

Hook nip05api into the nos.social host #66

Closed dcadenas closed 7 months ago

dcadenas commented 9 months ago

Now that the nos_social Ansible script is merged, we need to integrate nip05api as a container in the role's Docker Compose. The API requires Redis, which can be hosted within the setup or offloaded to another host.

  1. nip05api Integration: Add nip05api to docker-compose.yml, setting NODE_ENV to production. AUTH_PUBKEY and ROOT_DOMAIN are fine by default for nos.social but they can be set too for consistency.

  2. Traefik Routing & DNS Setup: Implement routing with traefik.http.routers.nip05api.rule=HostRegexp({subdomain:[a-zA-Z0-9-]+}.nos.social). Ensure DNS configurations route all nos.social subdomains to nip05api, while redirect-service continues to serve requests to the naked domain.

  3. Redis Configuration: Set REDIS_HOST in nip05api to link to a local or remote Redis instance as per requirement.

  4. Hook /metrics to prometheus: Add a scrape job in Prometheus for nip05api to collect metrics from the /metrics endpoint.

  5. Automated Deployment: Establish a process to automatically deploy nip05api when a new Docker image is pushed to the repository.

dcadenas commented 9 months ago

@cooldracula let's wait on this, we are currently thinking about using some other domain for this service.

cooldracula commented 9 months ago

Per an update from @setch-l, nos.social was chosen as the domain, and I think the work on this ticket can continue.

dcadenas commented 8 months ago

@cooldracula to match the Request a deployment checklist I'm adding this:

cooldracula commented 8 months ago

I opened a PR(#72), that updates our traefik and nos_social roles to match these requirements.

I used the scripts to deploy social.ansible.fun, but held off on running it for nos.social as I want to make sure the redirects work as expected before making changes to this major site.

I'll go through the requirements and how they were done in this script, to check I have it correct:

1.) _nip05api Integration: Add nip05api to docker-compose.yml, setting NODE_ENV to production. AUTH_PUBKEY and ROOTDOMAIN are fine by default for nos.social but they can be set too for consistency.

I created an image from the current nip05api repo, using the existing dockerfile (though updated to node 18) and pushed it to my cooldracula registry (will clarify more in task 5).

We use this image in the updated vars in #72. For AUTH_PUBKEY, I used the default value published in nip05api's config.

2.) Traefik Routing & DNS Setup: Implement routing with traefik.http.routers.nip05api.rule=HostRegexp({subdomain:[a-zA-Z0-9-]+}.nos.social). Ensure DNS configurations route all nos.social subdomains to nip05api, while redirect-service continues to serve requests to the naked domain.

This was tricky, but I think I have it. Using social.ansible.fun as an example: If I make a request to the API, e.g.

curl -L -H 'Host: social.ansible.fun' http://social.ansible.fun/.well-known/nostr.json\?name\=zach

It returns a json response, from the api:

{"error":"Name not found"}

If I visit an arbitrary subdomain, like https://foo.social.ansible.fun, I get the response:

Cannot GET /

Which is an error, but one coming from the API service. I assumed this error made sense as the name resolution may not be implemented, especially for a random name?

If I visit https://social.ansible.fun/metrics, I get the metrics page created by the api service.

If i visit https://social.ansible.fun or https://social.ansible.fun/baz, I get a 404 page from webflow. This tells me the traffic was directed to the redirect-service correctly, and it is using the redirect rule outlined in the nginx-redirect.conf, but I am not sure what the expected behaviour form webflow should be in this case.

Lastly, if I go to https://traefik.social.ansible.fun, we get our traefik dashboard (requires authentication, with the name and password stored in the role's vars and encrypted by ansible-vault).

So I am iffy on this section just because I am getting 404 and errors. They are reasonable and coming from the right places, but still wanna confirm the behaviour.

3.) _Redis Configuration: Set REDISHOST in nip05api to link to a local or remote Redis instance as per requirement.

I added redis to the docker compose for nos_social, with a persistent docker volume. I think a local deployment makes sense, as we design a more standardized architecture for db's,caches, and backups, and having it be local both simplifies the deployment and reduces network lag for the cache.

4.) Hook /metrics to prometheus: Add a scrape job in Prometheus for nip05api to collect metrics from the /metrics endpoint. There are two different metrics created with this deployment that we can easily hook up to prometheus. I did not do it for social.ansible.fun as it is a dev server.

Our scripts install node-exporter, available at http://social.ansible.fun:9100/metrics. This has a custom health check added that pings the application's metrics endpoint. If it doesn't return 200, then there is some issue with our application or routing.

We also have the application-specific metrics at https://social.ansible.fun/metrics

5.) Automated Deployment: Establish a process to automatically deploy nip05api when a new Docker image is pushed to the repository.

I added a PR to the nip05api repo (https://github.com/planetary-social/nip05api/pull/4) with a workflow for building and pushing docker images (along with a simple nix dev environment). As the nip05api repo is private, I set the workflow to push to my cooldracula repo. We will want to adjust the settings of the repo to have packages be public, or make the whole repo public, so we can publish to ghcr.io instead.

There is a service on the server to pull any image updates and restart the services when a new image is pulled.

Lastly, I updated the new-server-vars to use a droplet with 8gb ram. It is a sizable and relatively pricy droplet with that much ram ($48/month), but our overall server costs are fairly low.

If the PR's referenced in this ticket look good and the redirections for social.ansible.fun are working as generally expected, then I could run the scripts for nos.social itself.

cooldracula commented 8 months ago

I've deployed the service for nos.social, with the redirect to webflow for the bare domain.

I added the metrics for the application to prometheus, with the name nos_social, as well as the node_exporter metrics for the machine itself.

DNS seems to have propogated correctly and the bare domain is redirectly appropriately.

I think this general ticket can be closed, but new ones may open as we figure out the finer details of the api?