Closed JuanCab closed 3 years ago
Failure is due to lack of a proper DNS entry for our server (no "A" entry specifically). Working on it, but I am closing this problem for now.
Actually, this fixed Problem 2, Problem 1 (the confusing issue of running the commands within an http connected terminal) still exists.
I tried the same from ssh all the way with the same result.
New Ubuntu 18.04 install sudo apt update sudo apt upgrade followed the "your own server" instructions to the letter (https://the-littlest-jupyterhub.readthedocs.io/en/latest/install/custom-server.html) then followed the https instructions and got this
$ sudo tljh-config reload proxy Proxy reload with new configuration complete
but
https still doesnt work
I am facing the same issue. There is an invalid HTTPS certificate that the hub is served with. No negotiation with letsencrypt whatsoever. For now, I'm setting up letsenrypt with certbot (https://certbot.eff.org/lets-encrypt/ubuntubionic-other) and getting the certificate and key separately and using the manual HTTPS setup for TLJH
Same issue here, I'll try fixing it as @parthjoshi2007 did. Did you install and setup Certbot after TLJH installation?
Heya! I just merged #328, seen in http://tljh.jupyter.org/en/latest/howto/admin/https.html. There's a short 'troubleshooting' section too. Would love to see the logs from traefik here, so we can help figure out what's going on.
Hi,
I am getting the same issue. I follow the instructions but then get an invalid hub certificate. @yuvipanda Here are my traefik logs: logs.txt
This might be part of the problem: 'Unable to obtain ACME certificate for domains \"jupyter.ortom.co.uk\" : unable to generate a certificate for the domains [jupyter.ortom.co.uk]: acme: Error -> One or more domains had a problem:\n[jupyter.ortom.co.uk] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetchinghttp://jupyter.ortom.co.uk/.well-known/acme-challenge/ntPU29uuqFL-B7fvSWildcV8sk5FlONSHD4FPpoSQYg: Timeout during connect (likely firewall problem)\n'
But this bit is odd too
Jun 05 14:31:03 ip-172-31-38-191 traefik[11827]: time="2019-06-05T14:31:03Z" level=info msg="Starting provider *acme.Provider{\"Email\":\"tom@ortom.co.uk\",\"ACMELogging\":false,\"CAServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"Storage\":\"acme.json\",\"EntryPoint\":\"https\",\"KeyType\":\"\",\"OnHostRule\":false,\"OnDemand\":false,\"DNSChallenge\":null,\"HTTPChallenge\":{\"EntryPoint\":\"http\"},\"TLSChallenge\":null,\"Domains\":[{\"Main\":\"j\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"p\",\"SANs\":null},{\"Main\":\"y\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"e\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"m\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"c\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"k\",\"SANs\":null}],\"Store\":{}}"
Facing the same issue. Jun 13 15:30:53 paytonstudio traefik[20277]: time="2019-06-13T15:30:53Z" level=error msg="Unable to obtain ACME certificate for domains \"studyworthy.xyz\" : unable to generate a certificate for the domains [studyworthy.xyz]: acme: Error -> One or more domains had a problem:\n[studyworthy.xyz] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetching http://studyworthy.xyz/.well-known/acme-challenge/xxxxxxx: Timeout during connect (likely firewall problem)\n"
It looks like I have the same error: even if it's a 503, it seems Let's encrypt needs the "domain.bar/.well-known/acme-challenge/" folder to be reachable, and it can't reach it.
This article seems to be hinting to this : https://nixcp.com/lets-encrypt-the-client-lacks-sufficient-authorization-invalid-response/ (see towards the end) No idea how this would be feasible with tljh.
Here's my "anonymised" error (can provide more if needed):
Jun 27 19:07:30 foo traefik[17773]: time="2019-06-27T19:07:30+02:00" level=error msg="Unable to obtain ACME certificate for domains \"foo.bar\" : unable to generate a certificate for the domains [foo.bar]: parthjoshi2007acme: Error -> One or more domains had a problem:\n[foo.bar] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Invalid response from http://foo.bar/.well-known/acme-challenge/EhJX[35moreCaracters]3oSI [ip.v4.XX.XX]: \"<!DOCTYPE html>\\n<html>\\n <head>\\n <title>503 Backend fetch failed</title>\\n </head>\\n <body>\\n <h1>Error 503 Backend fetch f\"\n"
So I'll do like @parthjoshi2007 and set it up with certbot for now.
Ok, so to me the error is "clear" :
From: https://certbot.eff.org/docs/using.html#webroot
The webroot plugin works by creating a temporary file for each of your requested domains in ${webroot-path}/.well-known/acme-challenge. Then the Let’s Encrypt validation server makes HTTP requests to validate that the DNS for each requested domain resolves to the server running certbot. An example request made to your web server would look like:
66.133.109.36 - - [05/Jan/2016:20:11:24 -0500] "GET /.well-known/acme-challenge/HGr8U1IeTW4kY_Z6UIyaakzOkyQgPr_7ArlLgtZE8SX HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
Note that to use the webroot plugin, your server must be configured to serve files from hidden directories. If /.well-known is treated specially by your webserver configuration, you might need to modify the configuration to ensure that files inside /.well-known/acme-challenge are served by the webserver.
And tljh doesn't allow to reach these files, thus, visibly, the challenge fails.
Thinking of it, I hadn't set up the DNS redirection properly: I had set up a permanent web forwarding, not an A DNS (for foo.bar to ip) and CNAME DNS records (for www.foo.bar to foo.bar) Explanations here, and setup instructions if you're on Gandi: https://docs.gandi.net/en/domain_names/common_operations/link_domain_to_website.html Other good Explanations : https://support.dnsimple.com/articles/a-record/
Now if I had done this properly from the start, it may have worked with the tljh's default letsencrypt; When I find time, I'll test ;) (as I guess it works much better for the certificate renewal).
EDIT: This was indeed the problem, see my next post
Meanwhile, I finally got cerbot to work ( https://certbot.eff.org/lets-encrypt/ubuntubionic-other ) after quite a bit of trial-error, so I'm going to post what I'd been happy bumping on myself. However, it's just what I did on my server, there may be shorter and simpler, but to be sure that would require a bit of testing that I don't have time to do.
I was in root, all this will need extra sudo's otherwise.
First, I undid all I had set up during my previous trials to setup https (we never know):
tljh-config unset https.enabled
tljh-config unset OR remove-item AnyOtherStuffTested
tljh-config reload
Then I tried the standalone certbot:
sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar
But I had this error:
Problem binding to port 80: Could not bind to IPv4 or IPv6.
So I started with:
ufw allow 80
But it didn't work yet
Actually, the reason why it wasn't working is that tljh still had it's frontend running on my address.
So to see if I could stop it, I tried (note: my jupyter instances/servers where all already shutdown, no idea if it's important):
systemctl | grep running
systemctl stop jupyterhub.service
Still not enough; by running:
ss -tlnp | grep -E ":(80|443)"
I saw that I still had traefik squatting the ports; so:
systemctl stop traefik.service
And yay! Finally
sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar
worked :)
So
systemctl start traefik.service
systemctl start jupyterhub.service
I finally could load my key and certificat following the instructions in the second part of the tutorial: http://tljh.jupyter.org/en/latest/howto/admin/https.html
Now the problem I guess, is that for certificat renewal, I'll have to shut down the server again; so I'll definitely try the proper way anew later.
Ok, so still fulfilling my noob role in this story, I ended up totally messing up my install. So I restarted from zero, and this time tested the proper tljh way of setting up a certificate. And guess what, it worked! So the issue was me not setting up the DNS records properly, confirmed.
By the way, having a look at sudo systemctl status traefik.service
can help identify things a bit, if there is some network problem (I found it useful).
I have literally done the install dozens of times and it never worked. Which instructions did you follow?
On Sun, Jun 30, 2019 at 12:04 PM gantheaume notifications@github.com wrote:
Ok, so still fulfilling my noob role in this story, I ended up totally messing up my install. So I restarted from zero, and this time tested the proper tljh way of setting up a certificate. And guess what, it worked! So the issue was me not setting up the DNS records properly, confirmed.
By the way, having a look at sudo systemctl status traefik.service can help identify things a bit, if there is some network problem (I found it useful).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jupyterhub/the-littlest-jupyterhub/issues/115?email_source=notifications&email_token=AABTUQBEHFNXW647NSMKNUTP5DKSRA5CNFSM4FOKGB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4O74Y#issuecomment-507047923, or mute the thread https://github.com/notifications/unsubscribe-auth/AABTUQHMOSIDKFVLDIWPCPTP5DKSRANCNFSM4FOKGB7A .
-- Not sent from my iPhone
I have literally done the install dozens of times and it never worked. Which instructions did you follow?
Sorry for my late answer, I'm quite busy at the moment; Here is precisely all I did, from a clean Ubuntu server 18.04 install:
If your user hasn't the sudo rights:
su
usermod -a -G sudo yourusername
exit
From now on, everything is run from the normal user "yourusername":
sudo apt-get update
sudo apt-get upgrade ## Enter on all dialogs if there are some
sudo dpkg-reconfigure locales ## to have locals set up properly and stop having LC errors; I chose EN-US utf8
sudo apt-get install linux-headers-generic ethtool libc-dev linux-libc-dev python3-dev
sudo reboot
Now all is ready, we can do:
sudo ls ## just to have the sudo password entered
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo -E python3 - --admin myfirstadminuser ## that's precisely the command of the install instructions in the manual: http://tljh.jupyter.org/en/latest/install/custom-server.html
Then, get things going; I don't know if it's all needed:
export PATH=/opt/tljh/user/bin:${PATH}
nano ~/.bashrc && source ~/.bashrc ## Added the export path from above; source: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html
sudo env PATH=${PATH} conda update -n base conda ## do not forget the "env"; it's actually missing from the tutorial page above, I'll think about editing it.
At last, the normal SSL procedure from this page: http://tljh.jupyter.org/en/latest/howto/admin/https.html
sudo tljh-config set https.enabled true
sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important
sudo tljh-config add-item https.letsencrypt.domains mydomain.me
sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me
sudo tljh-config show
When all is good:
sudo tljh-config reload proxy
Now if you configured the DNS records properly (see my previous long post), all should go fine, and going to "mydomain.me" should bring you directly on the login secured with https ;)
Good luck testing ;)
Note that i already had a working https setup on the same domain using the universal letsencrypt procedure {my long post above) but I then wiped everything at started with a new ubuntu install, so it should not affect anything. Second, all this was part of quite a bit of trial and error, so you're welcome to suggest improvements!
(By the way, it seems that the only reliable way of installing extra python modules is to use the command sudo -E pip intall module
in the jupyter notebook terminal online! ‒and doing a sudo -E pip install --upgrade pip
before‒. I didn't manage any install of working modules any other way ‒for example through ssh‒. When I have time I'll dig this, as it's another issue. Linked help page, that details the steps: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html)
Thanks so much...
Not sure if it'll help anyone else but basically, had to port forward the HTTPS port 443 on my router. Had only done it for 80. ( ¬_¬)
@gantheaume's tip to use sudo systemctl status traefik.service
helped me to figure out my issue. SSL was working fine but then the certificate expired. The problem was the firewall I set up after the initial installation. Look's like port 80 is needed for the renewal of the certificates...
error msg="Unable to obtain ACME certificate for domains \"a_domain.com\" : unable to generate a certificate for the domains [a_domain.com]: acme: Error -> One or more domains had a problem. [a_domain.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://a_domain.com/.well-known/acme-challenge/ra01JKbw3Wv194BDVhjSeK_nkbFA-UVYqnhv08LUoM [2606:4700:3037::681b:a340]
Port 80 must be open for HTTP traffic over IPv4. I had mine restricted to IPv6 (by mistake) and allowing IPv4 traffic on 80 resolved it.
Piggybacking a bit on @gantheaume solution...
I ended up here after installing TLJH on an Azure virtual machine. For me let's encrypt did not work either at first. sudo journalctl -u traefik
showed either timeouts or server misbehaving in the ACME error message. My problem was caused setting up the DNS records AFTER running the install process. Configuring the Let's encrypt proxy and reloading the proxy with sudo tljh-config reload proxy
did not fix the problem.
My fix: Make sure your configuration is correct and then restart your virtual machine. Afterwards everything worked smoothly
So here's the proper way to do it for future reference:
sudo tljh-config set https.enabled true sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important sudo tljh-config add-item https.letsencrypt.domains mydomain.me sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me sudo tljh-config show sudo tljh-config reload proxy
This issue covered a lot of debugging related to failure to setup HTTPS.
I think what was missing from the documentation was perhaps notes on:
Since this issue is long and hard to follow at this point, and that I consider it to be resolved by better documentation. I'm closing this an opening a new one referencing these documentation improvements as the action point for that new issue, and pointing back to this as its origin.
On a freshly installed jupyterhub that is visible to the outside world, I followed the Let's Encrypt instructions on the Enabling HTTPS document page. I confirmed
sudo -E tljh-config show
returns the expected content compared to what is in the documentation.Problem 1) When I do
sudo -E tljh-config reload proxy
, nothing happens. In fact, I realized that the connection hangs if you are doing this through the terminal on the jupyterhub. This is not surprising since it is shutting down http and turning on https. However, there is no warning in the documentation that this will happen.Problem 2) When I try to go to the https connection, it is active, but the certificate is NOT being recognized as "verified by a third party." (in Chrome, this is NET::ERR_CERT_AUTHORITY_INVALID) It does appear to be created since its name is "TRAEFIK DEFAULT CERT".
The documentation should be updated to fix Problem 1, and I would appreciate any hints as to how to 'redo' the proxy connection properly. I did try re-running
sudo -E tljh-config reload proxy
from ssh, and it returnedProxy reload with new configuration complete
but didn't fix the issue.We did revert to a snapshot of the VM from before activation of HTTPS and try the instructions from an SSH terminal. The result was the same except that
sudo -E tljh-config reload proxy
from ssh, returnedProxy reload with new configuration complete
(since the http session terminal was not used), but the certificate is still not recognized as a third party verified certificate. Is there something more we need to do?