hyphacoop / api.distributed.press

https://distributed.press
GNU Affero General Public License v3.0
77 stars 8 forks source link

API, datkey and dnslink hostname not being created #15

Closed ASoTNetworks closed 1 year ago

ASoTNetworks commented 3 years ago

When adding the new project two.compost.digital the backend didn't create api.two.compost.digital resulting in Let's Encrypt failing.

It also didn't create datkey and dnslink records.

YurkoWasHere commented 3 years ago

TL;DR: You must first publish content to the site to create the entries.

Pinning service which is responsible for creating these will not trigger until the project's www folder is created.

This folder is only created when you push content for the first time

benhylau commented 3 years ago

Ahh thanks for filing this. It's not Distributed Press, it's bc the config file has a mismatched domain name we forgot to change, so it never succeeded uploading the project config, and therefore everything that comes after doesn't happen.

https://github.com/hyphacoop/two.compost.digital/issues/22

If we set up the project correctly, this would not have happened. These two GH issues should cancel out each other ;)

@ASoTNetworks pls confirm and close both if this sounds right.

YurkoWasHere commented 3 years ago

I dont think hyphacoop/two.compost.digital#22 is related

datkey and dnslink will only be created if the /www folder exists https://github.com/hyphacoop/api.distributed.press/blob/main/pinning-services/index.js#L103 https://github.com/hyphacoop/api.distributed.press/blob/main/pinning-services/index.js#L138

WWW gets created only during a publish extraction https://github.com/hyphacoop/api.distributed.press/blob/main/api/index.js#L149 https://github.com/hyphacoop/api.distributed.press/blob/main/api/index.js#L157

YurkoWasHere commented 3 years ago

PR that will create and empty project and WWW folder during API key creation

Another possible location to create the www folder would be during configuration. Although it may be to late for Lets Encrypt

benhylau commented 3 years ago

But if the folder is empty, isn't correct behaviour to not publish anything rather than try to publish an empty folder? Similarly if dirApi is absent, isn't it correct to not sync the folder, or publish DNS records?

If someone configures an account, but never proceeds to publish anything, why should we do all the downstream stuff? Like getDatSeed, etc. and risk odd behaviour if some protocol cannot sync an empty folder?

In other words, why is this a bug?

YurkoWasHere commented 3 years ago

The underlying issue is lets encrypt fails. Seemingly because the DNS entries where not created.

The other solution would be to deal with the lets encrypt part of the equation but i'm actually not sure where in the code that's dealt with (its outside of the API scope)

benhylau commented 3 years ago

Yea this should be resolved from the letsencrypt side bc it needs to handle missing directories. @ASoTNetworks it that on the NGINX side?

YurkoWasHere commented 3 years ago

If someone configures an account, but never proceeds to publish anything, why should we do all the downstream stuff? Like getDatSeed, etc. and risk odd behaviour if some protocol cannot sync an empty folder?

This is also why i was leaning towards a "create on configure" instead on API key. And maybe throw a placeholder page before the first publish, even if its a method of confirming to the user that its setup.

But this may not solve the lets encrypt issue.

benhylau commented 3 years ago

I am unwilling to do this bc some protocols track change history, and we'd be essentially posting a file that the user never published intentionally. So the issue here is on the letsencrypt side needing to do that check first, before running the cmd. This is not a bug here.

benhylau commented 3 years ago

You have to fix it here https://github.com/hyphacoop/ansibles/blob/381983927b1c83614730dde0daf3557d84b3e5e7/distributed-press/srv1.distributed.press/roles/install-api/files/press-publish

YurkoWasHere commented 3 years ago

A patch for the issue should be a simple replacement of

https://github.com/hyphacoop/ansibles/blob/381983927b1c83614730dde0daf3557d84b3e5e7/distributed-press/srv1.distributed.press/roles/install-api/files/press-publish#L14

with

if [ -d "/home/press/.distributed-press/data/projects/$project/www/" ] && 
[ -d "/home/press/.distributed-press/data/projects/$project/api/" ] && 
[ ! -L "/etc/nginx/sites-enabled/$project" ]

Furthermore /home/press/.distributed-press/ should not be hardcoded Instead it should be read from the ~/.distributed-press/config.json file

    const fileConfig = fs.readFileSync(confFile);
    conf = JSON.parse(fileConfig);
    dataDir = conf['dataDirectory'];
YurkoWasHere commented 3 years ago

staging=0 This should be a variable some too. It is in ansible but im not sure if it gets pushed into the env anywhere or if this is currently a manual thing.

benhylau commented 3 years ago

We need to assume that one or more of www, api, or neither, is present. Therefore we have 4 cases of:

certbot certonly --webroot -w /var/www/html/ -d $project -d api.$project --rsa-key-size 4096

Where we need to request one of:

-d $project -d api.$project
-d $project
-d api.$project
(or skip)

Also the order of operation is folder -> DNS -> cert so the fact that folder is present does not guarantee DNS is entry available, so certbot could still fail when folder is present. So I think the best path is to move this script to nodejs and make it part of the app, as @YurkoWasHere suggested in chat.

ASoTNetworks commented 3 years ago

The script /usr/local/sbin/press-publish that configures NGINX and obtain Let's Encrypt runs every 5 minutes but when there is a site that have their domain not pointed to our server will cause it to reach Let's Encrypt's rate limit.

The following error is observed when running the script manually:

An unexpected error occurred:
There were too many requests of a given type :: Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/
Please see the logfiles in /var/log/letsencrypt for more details.
FAILED to get cert for dweb.libre.ws
Project: dweb.manyver.se
NGINX webroot: /home/press/.distributed-press/data/projects/dweb.manyver.se/www/
Getting cert for dweb.manyver.se
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Obtaining a new certificate
An unexpected error occurred:
There were too many requests of a given type :: Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/
Please see the logfiles in /var/log/letsencrypt for more details.
FAILED to get cert for dweb.manyver.se
Project: filecoinfoundationweb.hypha.coop
NGINX webroot: /home/press/.distributed-press/data/projects/filecoinfoundationweb.hypha.coop/www/
Getting cert for filecoinfoundationweb.hypha.coop
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Obtaining a new certificate
An unexpected error occurred:
There were too many requests of a given type :: Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/
Please see the logfiles in /var/log/letsencrypt for more details.
FAILED to get cert for filecoinfoundationweb.hypha.coop
All done

dweb.libre.ws is not pointed at our server and caused it to reach the rate limit.

To solve this in the future we need to check if the hostname is pointed at our server before attempting to get Let's Encrypt certificate.

YurkoWasHere commented 3 years ago

I think this is the result of an async race condition error. 3 processes are running

At minimum as @ASoTNetworks mentioned, press-publish should test the domain to confirm the correct ip address is resolved before trying to create the file (or putting a place hold in)

jackyzha0 commented 1 year ago

Closing (relevant to v0)