jupyterhub / the-littlest-jupyterhub

Simple JupyterHub distribution for 1-100 users on a single server
https://tljh.jupyter.org
BSD 3-Clause "New" or "Revised" License
1.04k stars 341 forks source link

Installing https support via Let's Encrypt appears broken (instructions problematic) #115

Closed JuanCab closed 3 years ago

JuanCab commented 6 years ago

On a freshly installed jupyterhub that is visible to the outside world, I followed the Let's Encrypt instructions on the Enabling HTTPS document page. I confirmed sudo -E tljh-config show returns the expected content compared to what is in the documentation.

Problem 1) When I do sudo -E tljh-config reload proxy, nothing happens. In fact, I realized that the connection hangs if you are doing this through the terminal on the jupyterhub. This is not surprising since it is shutting down http and turning on https. However, there is no warning in the documentation that this will happen.

Problem 2) When I try to go to the https connection, it is active, but the certificate is NOT being recognized as "verified by a third party." (in Chrome, this is NET::ERR_CERT_AUTHORITY_INVALID) It does appear to be created since its name is "TRAEFIK DEFAULT CERT".

The documentation should be updated to fix Problem 1, and I would appreciate any hints as to how to 'redo' the proxy connection properly. I did try re-running sudo -E tljh-config reload proxy from ssh, and it returned Proxy reload with new configuration complete but didn't fix the issue.

We did revert to a snapshot of the VM from before activation of HTTPS and try the instructions from an SSH terminal. The result was the same except that sudo -E tljh-config reload proxy from ssh, returned Proxy reload with new configuration complete (since the http session terminal was not used), but the certificate is still not recognized as a third party verified certificate. Is there something more we need to do?

JuanCab commented 6 years ago

Failure is due to lack of a proper DNS entry for our server (no "A" entry specifically). Working on it, but I am closing this problem for now.

JuanCab commented 6 years ago

Actually, this fixed Problem 2, Problem 1 (the confusing issue of running the commands within an http connected terminal) still exists.

ajhenley commented 5 years ago

I tried the same from ssh all the way with the same result.

New Ubuntu 18.04 install sudo apt update sudo apt upgrade followed the "your own server" instructions to the letter (https://the-littlest-jupyterhub.readthedocs.io/en/latest/install/custom-server.html) then followed the https instructions and got this

$ sudo tljh-config reload proxy Proxy reload with new configuration complete

but

https still doesnt work

parthjoshi2007 commented 5 years ago

I am facing the same issue. There is an invalid HTTPS certificate that the hub is served with. No negotiation with letsencrypt whatsoever. For now, I'm setting up letsenrypt with certbot (https://certbot.eff.org/lets-encrypt/ubuntubionic-other) and getting the certificate and key separately and using the manual HTTPS setup for TLJH

lucas-mior commented 5 years ago

Same issue here, I'll try fixing it as @parthjoshi2007 did. Did you install and setup Certbot after TLJH installation?

yuvipanda commented 5 years ago

Heya! I just merged #328, seen in http://tljh.jupyter.org/en/latest/howto/admin/https.html. There's a short 'troubleshooting' section too. Would love to see the logs from traefik here, so we can help figure out what's going on.

tomliptrot commented 5 years ago

Hi,

I am getting the same issue. I follow the instructions but then get an invalid hub certificate. @yuvipanda Here are my traefik logs: logs.txt

tomliptrot commented 5 years ago

This might be part of the problem: 'Unable to obtain ACME certificate for domains \"jupyter.ortom.co.uk\" : unable to generate a certificate for the domains [jupyter.ortom.co.uk]: acme: Error -> One or more domains had a problem:\n[jupyter.ortom.co.uk] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetchinghttp://jupyter.ortom.co.uk/.well-known/acme-challenge/ntPU29uuqFL-B7fvSWildcV8sk5FlONSHD4FPpoSQYg: Timeout during connect (likely firewall problem)\n'

tomliptrot commented 5 years ago

But this bit is odd too Jun 05 14:31:03 ip-172-31-38-191 traefik[11827]: time="2019-06-05T14:31:03Z" level=info msg="Starting provider *acme.Provider{\"Email\":\"tom@ortom.co.uk\",\"ACMELogging\":false,\"CAServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"Storage\":\"acme.json\",\"EntryPoint\":\"https\",\"KeyType\":\"\",\"OnHostRule\":false,\"OnDemand\":false,\"DNSChallenge\":null,\"HTTPChallenge\":{\"EntryPoint\":\"http\"},\"TLSChallenge\":null,\"Domains\":[{\"Main\":\"j\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"p\",\"SANs\":null},{\"Main\":\"y\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"e\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"m\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"c\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"k\",\"SANs\":null}],\"Store\":{}}"

efedorov-dart commented 5 years ago

Facing the same issue. Jun 13 15:30:53 paytonstudio traefik[20277]: time="2019-06-13T15:30:53Z" level=error msg="Unable to obtain ACME certificate for domains \"studyworthy.xyz\" : unable to generate a certificate for the domains [studyworthy.xyz]: acme: Error -> One or more domains had a problem:\n[studyworthy.xyz] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetching http://studyworthy.xyz/.well-known/acme-challenge/xxxxxxx: Timeout during connect (likely firewall problem)\n"

gantheaume commented 5 years ago

It looks like I have the same error: even if it's a 503, it seems Let's encrypt needs the "domain.bar/.well-known/acme-challenge/" folder to be reachable, and it can't reach it.

This article seems to be hinting to this : https://nixcp.com/lets-encrypt-the-client-lacks-sufficient-authorization-invalid-response/ (see towards the end) No idea how this would be feasible with tljh.

Here's my "anonymised" error (can provide more if needed): Jun 27 19:07:30 foo traefik[17773]: time="2019-06-27T19:07:30+02:00" level=error msg="Unable to obtain ACME certificate for domains \"foo.bar\" : unable to generate a certificate for the domains [foo.bar]: parthjoshi2007acme: Error -> One or more domains had a problem:\n[foo.bar] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Invalid response from http://foo.bar/.well-known/acme-challenge/EhJX[35moreCaracters]3oSI [ip.v4.XX.XX]: \"<!DOCTYPE html>\\n<html>\\n <head>\\n <title>503 Backend fetch failed</title>\\n </head>\\n <body>\\n <h1>Error 503 Backend fetch f\"\n"

So I'll do like @parthjoshi2007 and set it up with certbot for now.

gantheaume commented 5 years ago

Ok, so to me the error is "clear" :

From: https://certbot.eff.org/docs/using.html#webroot

The webroot plugin works by creating a temporary file for each of your requested domains in ${webroot-path}/.well-known/acme-challenge. Then the Let’s Encrypt validation server makes HTTP requests to validate that the DNS for each requested domain resolves to the server running certbot. An example request made to your web server would look like:

66.133.109.36 - - [05/Jan/2016:20:11:24 -0500] "GET /.well-known/acme-challenge/HGr8U1IeTW4kY_Z6UIyaakzOkyQgPr_7ArlLgtZE8SX HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Note that to use the webroot plugin, your server must be configured to serve files from hidden directories. If /.well-known is treated specially by your webserver configuration, you might need to modify the configuration to ensure that files inside /.well-known/acme-challenge are served by the webserver.

And tljh doesn't allow to reach these files, thus, visibly, the challenge fails.

Thinking of it, I hadn't set up the DNS redirection properly: I had set up a permanent web forwarding, not an A DNS (for foo.bar to ip) and CNAME DNS records (for www.foo.bar to foo.bar) Explanations here, and setup instructions if you're on Gandi: https://docs.gandi.net/en/domain_names/common_operations/link_domain_to_website.html Other good Explanations : https://support.dnsimple.com/articles/a-record/

Now if I had done this properly from the start, it may have worked with the tljh's default letsencrypt; When I find time, I'll test ;) (as I guess it works much better for the certificate renewal).

EDIT: This was indeed the problem, see my next post

Meanwhile, I finally got cerbot to work ( https://certbot.eff.org/lets-encrypt/ubuntubionic-other ) after quite a bit of trial-error, so I'm going to post what I'd been happy bumping on myself. However, it's just what I did on my server, there may be shorter and simpler, but to be sure that would require a bit of testing that I don't have time to do.

I was in root, all this will need extra sudo's otherwise.

First, I undid all I had set up during my previous trials to setup https (we never know):

tljh-config unset https.enabled tljh-config unset OR remove-item AnyOtherStuffTested tljh-config reload

Then I tried the standalone certbot: sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar But I had this error: Problem binding to port 80: Could not bind to IPv4 or IPv6.

So I started with: ufw allow 80 But it didn't work yet

Actually, the reason why it wasn't working is that tljh still had it's frontend running on my address.

So to see if I could stop it, I tried (note: my jupyter instances/servers where all already shutdown, no idea if it's important): systemctl | grep running systemctl stop jupyterhub.service Still not enough; by running: ss -tlnp | grep -E ":(80|443)" I saw that I still had traefik squatting the ports; so: systemctl stop traefik.service

And yay! Finally sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar worked :)

So

systemctl start traefik.service
systemctl start jupyterhub.service

I finally could load my key and certificat following the instructions in the second part of the tutorial: http://tljh.jupyter.org/en/latest/howto/admin/https.html

Now the problem I guess, is that for certificat renewal, I'll have to shut down the server again; so I'll definitely try the proper way anew later.

gantheaume commented 5 years ago

Ok, so still fulfilling my noob role in this story, I ended up totally messing up my install. So I restarted from zero, and this time tested the proper tljh way of setting up a certificate. And guess what, it worked! So the issue was me not setting up the DNS records properly, confirmed.

By the way, having a look at sudo systemctl status traefik.service can help identify things a bit, if there is some network problem (I found it useful).

ajhenley commented 5 years ago

I have literally done the install dozens of times and it never worked. Which instructions did you follow?

On Sun, Jun 30, 2019 at 12:04 PM gantheaume notifications@github.com wrote:

Ok, so still fulfilling my noob role in this story, I ended up totally messing up my install. So I restarted from zero, and this time tested the proper tljh way of setting up a certificate. And guess what, it worked! So the issue was me not setting up the DNS records properly, confirmed.

By the way, having a look at sudo systemctl status traefik.service can help identify things a bit, if there is some network problem (I found it useful).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jupyterhub/the-littlest-jupyterhub/issues/115?email_source=notifications&email_token=AABTUQBEHFNXW647NSMKNUTP5DKSRA5CNFSM4FOKGB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4O74Y#issuecomment-507047923, or mute the thread https://github.com/notifications/unsubscribe-auth/AABTUQHMOSIDKFVLDIWPCPTP5DKSRANCNFSM4FOKGB7A .

-- Not sent from my iPhone

gantheaume commented 5 years ago

I have literally done the install dozens of times and it never worked. Which instructions did you follow?

Sorry for my late answer, I'm quite busy at the moment; Here is precisely all I did, from a clean Ubuntu server 18.04 install:

If your user hasn't the sudo rights:

su
usermod -a -G sudo yourusername
exit

From now on, everything is run from the normal user "yourusername":

sudo apt-get update
sudo apt-get upgrade  ## Enter on all dialogs if there are some
sudo dpkg-reconfigure locales ## to have locals set up properly and stop having LC errors; I chose EN-US utf8
sudo apt-get install linux-headers-generic ethtool libc-dev linux-libc-dev python3-dev
sudo reboot

Now all is ready, we can do:

sudo ls ## just to have the sudo password entered
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo -E python3 - --admin myfirstadminuser ## that's precisely the command of the install instructions in the manual: http://tljh.jupyter.org/en/latest/install/custom-server.html

Then, get things going; I don't know if it's all needed:

export PATH=/opt/tljh/user/bin:${PATH}
nano ~/.bashrc && source ~/.bashrc  ## Added the export path from above; source: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html
sudo env PATH=${PATH} conda update -n base conda ## do not forget the "env"; it's actually missing from the tutorial page above, I'll think about editing it.

At last, the normal SSL procedure from this page: http://tljh.jupyter.org/en/latest/howto/admin/https.html

sudo tljh-config set https.enabled true
sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important
sudo tljh-config add-item https.letsencrypt.domains mydomain.me
sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me
sudo tljh-config show

When all is good: sudo tljh-config reload proxy

Now if you configured the DNS records properly (see my previous long post), all should go fine, and going to "mydomain.me" should bring you directly on the login secured with https ;)

Good luck testing ;)

Note that i already had a working https setup on the same domain using the universal letsencrypt procedure {my long post above) but I then wiped everything at started with a new ubuntu install, so it should not affect anything. Second, all this was part of quite a bit of trial and error, so you're welcome to suggest improvements!

(By the way, it seems that the only reliable way of installing extra python modules is to use the command sudo -E pip intall module in the jupyter notebook terminal online! ‒and doing a sudo -E pip install --upgrade pip before‒. I didn't manage any install of working modules any other way ‒for example through ssh‒. When I have time I'll dig this, as it's another issue. Linked help page, that details the steps: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html)

ajhenley commented 5 years ago

Thanks so much...

asvinp commented 4 years ago

Not sure if it'll help anyone else but basically, had to port forward the HTTPS port 443 on my router. Had only done it for 80. ( ¬_¬)

hoenie-ams commented 4 years ago

@gantheaume's tip to use sudo systemctl status traefik.service helped me to figure out my issue. SSL was working fine but then the certificate expired. The problem was the firewall I set up after the initial installation. Look's like port 80 is needed for the renewal of the certificates...

dschofield commented 4 years ago

error msg="Unable to obtain ACME certificate for domains \"a_domain.com\" : unable to generate a certificate for the domains [a_domain.com]: acme: Error -> One or more domains had a problem. [a_domain.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://a_domain.com/.well-known/acme-challenge/ra01JKbw3Wv194BDVhjSeK_nkbFA-UVYqnhv08LUoM [2606:4700:3037::681b:a340]

Port 80 must be open for HTTP traffic over IPv4. I had mine restricted to IPv6 (by mistake) and allowing IPv4 traffic on 80 resolved it.

buggythepirate commented 4 years ago

Piggybacking a bit on @gantheaume solution...

I ended up here after installing TLJH on an Azure virtual machine. For me let's encrypt did not work either at first. sudo journalctl -u traefik showed either timeouts or server misbehaving in the ACME error message. My problem was caused setting up the DNS records AFTER running the install process. Configuring the Let's encrypt proxy and reloading the proxy with sudo tljh-config reload proxy did not fix the problem.

My fix: Make sure your configuration is correct and then restart your virtual machine. Afterwards everything worked smoothly

So here's the proper way to do it for future reference:

  1. First setup the DNS records
  2. then run sudo tljh-config set https.enabled true sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important sudo tljh-config add-item https.letsencrypt.domains mydomain.me sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me sudo tljh-config show sudo tljh-config reload proxy
consideRatio commented 3 years ago

This issue covered a lot of debugging related to failure to setup HTTPS.

I think what was missing from the documentation was perhaps notes on:

Since this issue is long and hard to follow at this point, and that I consider it to be resolved by better documentation. I'm closing this an opening a new one referencing these documentation improvements as the action point for that new issue, and pointing back to this as its origin.