turnkeylinux / tracker

TurnKey Linux Tracker
https://www.turnkeylinux.org
71 stars 16 forks source link

Let's Encrypt DNS-01 Failure Traceback - LXC TK Core 18.0-1 #1876

Closed zndrr closed 11 months ago

zndrr commented 1 year ago

Hi Team,

Have been attempting to retrieve a cert via LE in confconsole and met with the following traceback.\ Copied from confconsole UI, please excuse malformatting.

Hopefully I provide enough info and don't require too much back-and-forth:

Steps: confconsole > Lets encrypt > Get certificate > Accept TOS > dns-01

Traceback (most recent call last):
  File "/usr/bin/confconsole", line 719, in loop
    new_dialog = method()
                 ^^^^^^^^
  File "/usr/lib/confconsole/plugin.py", line 121, in run
    ret: Optional[str] = self.module.run()  
                         ^^^^^^^^^^^^^^^^^  
  File "/usr/lib/confconsole/plugins.d/Lets_Encrypt/get_certificate.py", line 153, in run  
    config = dns_01.load_config()  
             ^^^^^^^^^^^^^^^^^^  
AttributeError: 'NoneType' object has no attribute  
'load_config'

I'm kind of average at digging, but seems loosely that some configuration files might be missing ...

Particulars:

root@pve1a:~# pveversion
  pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-12-pve)

CT Template:
  debian-12-turnkey-core_18.0-1_amd64.tar.gz

root@tkcore ~# turnkey-version
  turnkey-core-18.0-bookworm-amd64

root@tkcore ~# apt show confconsole | grep Vers
  Version: 2.1.1

- LXC IP and Domain are static on LXC. Privileged container.
- Domain is hosted on Cloudflare.
- Nameservers were Gateway and 1.1.1.1, but also tried only the latter and 8.8.8.8; resolution was fine from the start however.
- Have retrieved many certs via other methods (PVE, opnsense, certbot, Traefik etc etc)
- Tried reinstalling confconsole via apt as well as manually installing latest deb from GH (though it's older).
- Rebuilt again from template.

If any other info is wanted, please let me know.

Attached the apt output (probably not useful) and the files mentioned in the trace: apt_show_confconsole.txt trace_files.zip

zndrr commented 1 year ago

HTTP-01 method allows specifying domains and does go through some motions. Definitely progresses further.\ Naturally this method isn't suitable and predictably fails.

Same outcome on the Bookstack 18.0-1 template with DNS-01; traceback is identical

JedMeister commented 1 year ago

Thanks so much for your excellent bug report.

I'll have to dig in and have a look. FWIW I haven't explicitly tested in an LXC container but AFAIK it should "just work", obviously that's not the case...

I've got a bit of a backlog of things that I need to follow up on, so it might be a day or 2 before I circle back to this. Assuming that I can reproduce the issue, I'll aim to at least post a workaround, if not a proper fixed package.

zndrr commented 1 year ago

@JedMeister thank you for the response. I'm happy to wait; not urgent.

I just tested TKL Core ISO in an attempt to help out.

Same traceback I'm afraid. Same hangups as the two LXC templates. tkcore_iso_dns01_trace

Hope that helps.

JedMeister commented 1 year ago

Thanks for your followup. On the plus side, I can reproduce the issue (simply trying to use DNS-01 gives me that same issue). I haven't yet had a closer look but I already have some ideas (I suspect it's as simple as the default conf file not being copied on install).

So hopefully I'll have a work around for you really soon and a proper fix (which I'll upload to the apt archive) soon after.

OnGle commented 1 year ago

Seems like it was caused by bit of an oversight on my behalf both on the plugin docs and implementation which lead to undefined behavior with this particular plugin (depending on the order in which plugins were loaded)

JedMeister commented 1 year ago

Ok @zndrr - apologies that this has taken a few days to get some action on, but I think we should be good now...

I've now tested locally and the fix(es) applied by @OnGle appear to have done the job. Although please note that I haven't actually tested getting a cert via the DNS-01 method. As such, there is a slim chance that there are still issues (sorry I don't have creds handy to do the full set up). Regardless, I'm pretty confident that this should fix the issue.

I've just uploaded it to the bookworm TurnKey apt repo. So this should do the trick:

apt update
apt install confconsole

If you continue to have issues, please post back.

zndrr commented 1 year ago

Thanks heaps for the turnaround. I have taken another dive and unfortunately run in to successive issues. There were traces, but I was quite disorganised in both method and capture.

It certainly got further of course.

I attached a few traces that I dumped to an IDE (hopefully sanitised sufficiently). Apologies that the repro steps aren't alongside.

trace1-ui.txt trace2-ui.txt trace3-ui-with-cli-inputs.txt - throwing things at wall to see what sticks XD trace4-ui.txt - domain instead of sub.domain (ruling out delegation?) traceback_lexicon-cli.txt - For this Lexicon CLI trace, this was performing the actions noted at the top of the file; list and create. List threw a trace and then valid list, and create just a trace.

It may take me some spare time to eliminate user error... and to complicate things more, I likely won't have time to poke again until the weekend following :(

One thing in particular that wasn't super clear to me, was the input expected here.\ I followed the URL and tried a few things but wasn't met with success ultimately

tkcore_dns-lexicon_entry

My general assumption was that I could use a Cloudflare API token with zone edit privilege which would create the acme challenge record for validation.

Their documentation wasn't super transparent on API global key vs Token and programmatic input assumptions; had to do some digging through Issues to clarify. Became a bit of a rabbit hole XD

Appreciate the product, so hopefully I can help resolve this in whatever manner you need.\ Happy to follow direction too where time allows.

root@tkcore ~# apt show confconsole
Package: confconsole
Version: 2.1.2
<omit>
zndrr commented 1 year ago

Ah sorry just quick revelation RE: lexicon CLI:

Fresh trace of sub.domain TXT create/delete: lexicon-create-delete-cli.txt

This output below the traces I'm assuming is the outcome heh

RESULT
------
True
JedMeister commented 1 year ago

Thanks heaps for the turnaround.

You're welcome. Although it probably would have been better if I'd fully tested it and actually ensured that the issue(s) were fixed...

I have taken another dive and unfortunately run in to successive issues. There were traces, but I was quite disorganised in both method and capture.

No problem. What you've provided is still useful. It demonstrates the issue(s) and also suggests that we need better documentation...

I attached a few traces that I dumped to an IDE (hopefully sanitised sufficiently). Apologies that the repro steps aren't alongside.

No problem, your stacktraces are still very useful.

It may take me some spare time to eliminate user error... and to complicate things more, I likely won't have time to poke again until the weekend following :(

Considering your second post, it appears that there have been some changes in the Debian python lexicon library (bullseye vs bookworm) that we didn't acknowledge/consider. It's possibly irrelevant now, but FWIW what might tighten up the feedback loop for you (and avoid any remaining Confconsole bugs - perhaps isolate bugs elsewhere) would be to use our dehydrated-wrapper script directly. It's not in the $PATH so you'll need to call it via it's full path: /usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper. I.e.:

root@bookworm-test ~# /usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper -h

E.g.:

/usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper -r -c dns-01 -p cloudflare

Having said that, as demonstrated by your direct use of lexicon, I suspect that there would still be issues...

One thing in particular that wasn't super clear to me, was the input expected here. I followed the URL and tried a few things but wasn't met with success ultimately

TBH, looking closer, you are absolutely correct! We definitely need to do a better job documenting this.

FWIW support for DNS-01 challenges was contributed by a user. I was pretty pumped about it (previously we only supported HTTP-01 challenges). I tested it (on TurnKey v17.x) at the time and it "just worked" so I was happy to merge. Although IIRC I only tested it in detail using AWS Route53 (which I was already using, had some understanding of and had handy).

However, following your feedback, a little testing and looking back over it all with a fresh (and perhaps more critical) eye, it's clear that I should tested more deeply on v18.0. I see a lot of shortcomings in the code, but even more so in the documentation... I don't have time or resources to rewrite the code right now, but obviously there isn't much value in providing something this broken...

My general assumption was that I could use a Cloudflare API token with zone edit privilege which would create the acme challenge record for validation.

I don't recall exactly, but my memory suggests it was essentially that easy when I tested (albeit via Route53 rather than Cloudflate). But I suspect that there was some Curse of knowledge going on and/or perhaps there are some bugs in the newer Debian package?.

Their documentation wasn't super transparent on API global key vs Token and programmatic input assumptions; had to do some digging through Issues to clarify. Became a bit of a rabbit hole XD

Definitely not very "turnkey"...

Appreciate the product, so hopefully I can help resolve this in whatever manner you need. Happy to follow direction too where time allows.

And your appreciation is appreciated :rofl:

My plan is to at least test, document and as need, debug usage with Route53. I imagine it won't be immediately transferable, but at least will demonstrate a "working" config. Hopefully it will also allow me to get an idea of what is actually going wrong here...

Ah sorry just quick revelation RE: lexicon CLI:

  • create did actually generate the record as specified; just found it in my CF dashboard
  • delete on CLI also deleted it, despite the ominous traceback
  • subdomain record creation/deletion also okay

Thanks for posting the additional info. One thing it highlights is that the core issue is actually with the lexicon library (or perhaps just the way we're trying to use it).

I aim to prioritise working in this but it's Friday here (for you too assuming you're in NZ - I'm in Au) and I have an existing commitment so whilst I'll try to at least post an update, I may not have much progress until early next week.

JedMeister commented 1 year ago

Ok, so I have made some progress, although I still haven't got as far as confirming that it works completely... I'm not 100% sure if it's relevant to you, but I strongly suspect so - even if it just avoids the scary looking stacktrace...

Beyond the issue(s) you reported, it appears that the lexicon package in Debian is missing some dependencies for my use case (route53). The workaround is to install via pip. However, just to add to the pain, in Debian Bookworm, the way pip works has changed quite a bit and a venv (virtual environment) is required.

Below, I'll detail what I've done. It seems to be working, although I haven't done much testing. I'll continue my testing and post back when I know more and/or have a proper fix.

To get to where I am, first remove the Debian lexicon package and it's Debian dependencies (which are no longer required). Please note that under normal circumstances this should be safe and not remove anything you need. However, please double check the packages to be removed and if there is anything that looks important, bail out and instead rerun without the --autoremove switch (strictly speaking we only need to remove the lexicon package itself).

apt purge --autoremove lexicon

FWIW here's the packages that it removed when I ran:

lexicon* libyaml-0-2* python3-bs4* python3-filelock* python3-importlib-metadata* python3-lexicon*
python3-more-itertools* python3-requests-file* python3-soupsieve* python3-tldextract* python3-yaml*
python3-zipp*

Now create a venv, install lexicon via pip and create a symlink (in /usr/local/bin) so it's in your $PATH (but not all the other venv bin files). It also includes the tldextract tool, so even though I'm not 100% sure if that is required or not, I've crated a symlink for that too so it can be run via CLI. These instructions assume that you're running as root (as most TurnKey users would be); if not run sudo su - first:

mkdir -p /usr/local/src/venv
python3 -m venv /usr/local/src/venv/lexicon
/usr/local/src/venv/lexicon/bin/pip install dns-lexicon[full]
ln -s /usr/local/src/venv/lexicon/bin/lexicon /usr/local/bin/lexicon
ln -s /usr/local/src/venv/lexicon/bin/tldextract /usr/local/bin/tldextract

That should at least allow you to run lexicon via CLI without spitting out stacktraces. As I noted, I'll keep testing and post back once I know more. In the meantime, if you get a chance to have a play with this, please let me know how you go.

zndrr commented 1 year ago

@JedMeister cheers. I have run your steps as indicated and can confirm actions are okay and clean; list, create, delete. One small deviation was venv pre-req, so I installed as advised.

mkdir -p /usr/local/src/venv
--> apt install python3.11-venv
python3 -m venv /usr/local/src/venv/lexicon
/usr/local/src/venv/lexicon/bin/pip install dns-lexicon[full]
ln -s /usr/local/src/venv/lexicon/bin/lexicon /usr/local/bin/lexicon
ln -s /usr/local/src/venv/lexicon/bin/tldextract /usr/local/bin/tldextract

Notably though, I did a second run of steps on a fresh updated LXC before reporting. Lexicon wasn't installed there, so purge had nothing to remove.

One other notable specific to Lexicon, is what I previously alluded to with the env vars and use of CF API Token:

LEXICON_CLOUDFLARE_USERNAME="you@email.com" <----- don't define this when using API token
LEXICON_CLOUDFLARE_TOKEN="super-secret-token"

Ironically I forgot, but defining both results in traces which... probably shouldn't happen: lexicon-cf_un-token_trace.txt

Noted for transparency in case fellow CF'ers come across this. The CLI has arguments for user/token as well, so that method might be preferable than the first codeblock in the Lexicon user_guide - which I first resorted to on problem discovery. The app help is a little better here:

root@tkcore ~# lexicon cloudflare -h

<omit>
  --auth-username AUTH_USERNAME
                        specify email address for authentication (for Global API key
                        only)
  --auth-token AUTH_TOKEN
                        specify token for authentication (Global API key or API
                        token)
<omit>

For the environment and any steps you want me to perform, I have a fresh apt updated Core LXC. i.e. can go ham.

But if you have specific requirements, then I can look to accommodate those as well. Thanks for the continued effort.

JedMeister commented 1 year ago

Thanks for your additional info. TBH, I haven't completely digested it, but I will circle back shortly and reread it.

In the meantime, I have opened a PR with my changes: https://github.com/turnkeylinux/confconsole/pull/82

Please feel free to test as is if you want, but so far I have done zero testing of the code, so it almost certainly will have some typos and other minor bugs (that often cause major crashes or lack of functionality). I'll push any updates I make to that PR.

As soon as it seems to be working ok for me, I'll build a deb package and post here. If you're able to do some testing for your use case, that'd be awesome. Speak more soon.

JedMeister commented 1 year ago

I think I might have jumped the gun a bit there. FWIW, I haven't even tried running it yet and I'm seeing heaps of bugs. That's mostly ok (and even somewhat expected). Although I've hit an issue that I hadn't expected. I'm not 100% sure exactly how I'll deal with it yet, but wanted to give you a heads up...

JedMeister commented 1 year ago

Ok, so I'm almost certain that there are still issues and improvements that can be made, but at least it's performing as I'd expect so far.

Also, I think that it might be useful to include some default config. Although I'm not sure what that might look like exactly, but if you have any ideas, please share. We should probably also create a new doc page on the website to point to the specific and relevant bits of the lexicon docs. And/or update the confconsole docs on the website...

In the meantime, please try the package I've attached to this post. Please note I had to append .txt to upload here (instructions below include renaming it). To download and install:

wget https://github.com/turnkeylinux/tracker/files/13483643/confconsole_2.1.2%2B7%2Bg0c3a7f0_all.deb.txt
mv confconsole_2.1.2+7+g0c3a7f0_all.deb.txt confconsole_2.1.2+7+g0c3a7f0_all.deb
apt install ./confconsole_2.1.2+7+g0c3a7f0_all.deb

Alternatively, you could just download the specific files from my PR (but this is easier). Either way, hopefully it should be a little closer. Please share any issues you hit (hopefully you don't hit too many...).

confconsole_2.1.2+7+g0c3a7f0_all.deb.txt (broken - don't download)

zndrr commented 1 year ago

Appreciated. Gave it a quick whirl under two scenarios:

Will have to revisit and repro steps in orderly manner. What config did you use to populate the lexicon page in confconsole? For this part I still don't quite understand how this is fed vs the lexicon CLI syntax in terminal

As for the config suggestions, that could be hard I guess, given the many flavours of provider. For CF though, I need Token, Zone and Domain. Zone == Domain in my case (unless I'm using subdomains of course; then they differ).

I believe they recommend an API token as opposed to the legacy Global API key. So could narrow scope by supporting only an API token. Not sure how that stacks up to the likes of R53 etc; would hope for some level of normalisation in base requirements.

Perhaps a disclaimer for tested providers and methods? I guess then you'd have Lexicon support concerns? Answering questions with questions now bahahaha. Has fallen a bit outside of my insight, but you've probably been thinking about a lot of things

PS I didn't consider domain wildcard certs as focus was on function. Not sure if that would need specific handling?

JedMeister commented 1 year ago

Argh... I knew it wouldn't be perfect, but it sounds like it's not as close as I had hoped.

  • stock, lexicon wouldn't install from confconsole, instead throwing a trace and circling back

My bad there. I wrote the code and tested the components as I went. I also tested it (after manually removing components) but your feedback suggests that I overlooked something - because I didn't actually do a full proper test on a clean install. And as the saying goes, untested code is buggy code... I'll focus on fixing that as my first port of call.

  • stock+lexicon installed as instructed above. got further but possibly self config input issues meaning cert issuance problematic. There were traces, but IIRC similar as those experienced prior at that step. No sign of TXT record creation despite logging suggesting creation (ie not indicative and/or mishandled by lexicon)

Damn!

Will have to revisit and repro steps in orderly manner.

Agreed.

What config did you use to populate the lexicon page in confconsole? For this part I still don't quite understand how this is fed vs the lexicon CLI syntax in terminal

Unfortunately, I don't recall. I'll go through it again from scratch and take some notes this time. I've since destroyed the VM I was working in, but will use that opportunity to fix the initial install (I'm sure it's something minor I overlooked).

As for the config suggestions, that could be hard I guess, given the many flavours of provider. For CF though, I need Token, Zone and Domain. Zone == Domain in my case (unless I'm using subdomains of course; then they differ).

Well seeing as it's only us working on this right now, my inclination is to get your use case (and mine) working. Then we can give those examples. That will be something for others to work with and if/when I get support requests from others, I can address those cases as they arise.

I believe they recommend an API token as opposed to the legacy Global API key. So could narrow scope by supporting only an API token.

TBH, I'm not sure what the difference is. Although reading between the lines I'm guessing that the API token means you don't need user ID? Just the token?

Not sure how that stacks up to the likes of R53 etc; would hope for some level of normalisation in base requirements.

For AWS you need a user or a role with the required permissions (I assume that's probably fairly consistent). FWIW a "role" in AWS is a special sort of user which is intended for programmatic access. A user requires an "AWS Access Key" (essentially an API user string) and an "AWS Secret Access Key" (essentially an API key). I'm less familiar with roles but AFAIK they only require an "AWS Secret Access Token" (sounds parallel to your CF "API token").

Perhaps a disclaimer for tested providers and methods? I guess then you'd have Lexicon support concerns? Answering questions with questions now bahahaha. Has fallen a bit outside of my insight, but you've probably been thinking about a lot of things

Haha. :grin: Yeah, perhaps a note to post here on GitHub?

PS I didn't consider domain wildcard certs as focus was on function. Not sure if that would need specific handling?

By my understanding in the case of DNS-01 challenges, they are considered a "normal" domain - so should make no material difference. Famous last words though, so I'll at least check with AWS so I at least have some concrete experience to share.

Anyway, I'll aim to fix the initial install and at a minimum share my AWS Route53 config (obviously not the secrets, but the general context).

It's Thursday afternoon here and I'm way behind schedule with everything... So whilst I hope to have some progress today, I may not post back until tomorrow, or perhaps even early next week. I think that's best as I'm pretty sure me rushing was part of the issue in it still being buggier than it should be...

Thanks again for your assistance and speak more soon.

JedMeister commented 1 year ago

Ok I think I've fixed the lexicon install bugs. I won't bother uploading the bugfixed package yet though, until I've at least got it to work from start to finish, even if only with AWS Route53.

JedMeister commented 12 months ago

Ok, so this should be MUCH closer! I can't guarantee that it's bug free as I haven't tested it exhaustively. However I did just install this build (which I'll attach again below) where I manually removed all config files first, then I got a cert (via Confconsole) using DNS-01 challenge via Route53! :tada:

Hopefully it will work just as smoothly for you. And apologies on my previous failed attempts...

FWIW, here is the output (when it's actually getting the cert):

[2023-12-01 05:56:36] dehydrated-wrapper: INFO: started
# INFO: Using main config file /etc/dehydrated/confconsole.config
+ Account already registered!
[2023-12-01 05:56:40] dehydrated-wrapper: INFO: found apache2 listening on port 443
[2023-12-01 05:56:40] dehydrated-wrapper: INFO: running dehydrated
# INFO: Using main config file /etc/dehydrated/confconsole.config
Processing test.jeremydavis.org
 + Creating new directory /var/lib/dehydrated/certs/test.jeremydavis.org ...
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for test.jeremydavis.org
 + 1 pending challenge(s)
 + Deploying challenge tokens...
[2023-12-01 05:56:50] confconsole.hook.sh: INFO: Deploying challenge for test.jeremydavis.org.
[2023-12-01 05:56:50] confconsole.hook.sh: INFO: Creating a TXT challenge-record with route53.
RESULT
------
True
 + Responding to challenge for test.jeremydavis.org authorization...
 + Challenge is valid!
 + Cleaning challenge tokens...
[2023-12-01 05:57:33] confconsole.hook.sh: INFO: Clean challenge for test.jeremydavis.org.
RESULT
------
True
 + Requesting certificate...
 + Checking certificate...
 + Done!
 + Creating fullchain.pem...
[2023-12-01 05:57:44] confconsole.hook.sh: SUCCESS: Cert request successful. Writing relevant files for test.jeremydavis.org.
[2023-12-01 05:57:44] confconsole.hook.sh: INFO: fullchain: /var/lib/dehydrated/certs/test.jeremydavis.org/fullchain.pem
[2023-12-01 05:57:44] confconsole.hook.sh: INFO: keyfile: /var/lib/dehydrated/certs/test.jeremydavis.org/privkey.pem
[2023-12-01 05:57:44] confconsole.hook.sh: SUCCESS: Files written/created for test.jeremydavis.org: /usr/local/share/ca-certificates/cert.crt - /etc/ssl/private/cert.key - /etc/ssl/private/cert.pem.
 + Done!
[2023-12-01 05:57:44] dehydrated-wrapper: INFO: dehydrated complete
[2023-12-01 05:57:44] dehydrated-wrapper: INFO: Cleaning backup cert & key
[2023-12-01 05:57:44] dehydrated-wrapper: INFO: (Re)starting apache2
[2023-12-01 05:57:45] dehydrated-wrapper: INFO: (Re)starting webmin.service
[2023-12-01 05:57:48] dehydrated-wrapper: INFO: dehydrated-wrapper completed successfully.

Also, reading back through our discussions, I apologise that I didn't pick up your confusion/lack of clarity re the lexicon config previously. I think it was a case of curse of knowledge. As it's a '.yml' file (format: KEY: VALUE) it didn't occur to me that you might have been adding them like bash vars (e.g. KEY=VALUE). That might explain part of your frustration and pain previously.

FWIW, I've added some additional info to the website doc page (note to self: I still need to sync that with the docs in the code repo). As well as adding a much better example config for both cloudflare and route53. The generic example config file leaves a lot to be desired, but I'm not really sure what more I can do with it?

I would love to hear how you go, in particular I'd appreciate your feedback on:

Please share anything else you find or think about it.

Fingers crossed that it works as well for you as it did me. FYI here's a copy of my lexicon config - `/etc/dehydrated/lexicon_route53.yml' (with the secrets replaced):

# Configure according to lexicon documentation https://dns-lexicon.readthedocs.io/
#
# uncomment relevant lines and replace example value(s)
# AWS Route53 example:
auth_access_key: ROUTE53_IAMS_USER_KEY
auth_access_secret: ROUTE53_IAMS_USER_SECRET
private_zone: False # generally you'll always want public zone
zone_id: ROUTE53_ZONE

The top line (and the line under it) is being added somewhere (pretty sure it's in dns_01.py) and doesn't need to be (so at least one thing that still needs attention). Otherwise, it "just works"! :grin:

Oh, and the deb:

confconsole_2.1.2+14+g8c379d8_all.deb.txt

wget https://github.com/turnkeylinux/tracker/files/13483643/confconsole_2.1.2%2B14%2Bg8c379d8_all.deb.txt
mv confconsole_2.1.2+14+g8c379d8_all.deb.txt confconsole_2.1.2+14+g8c379d8_all.deb
apt install ./confconsole_2.1.2+14+g8c379d8_all.deb
DragRedSim commented 12 months ago
# Configure according to lexicon documentation https://dns-lexicon.readthedocs.io/
#
# uncomment relevant lines and replace example value(s)
# AWS Route53 example:
auth_access_key: ROUTE53_IAMS_USER_KEY
auth_access_secret: ROUTE53_IAMS_USER_SECRET
private_zone: False # generally you'll always want public zone
zone_id: ROUTE53_ZONE

The top line (and the line under it) is being added somewhere (pretty sure it's in dns_01.py) and doesn't need to be (so at least one thing that still needs attention). Otherwise, it "just works"! 😁

https://github.com/turnkeylinux/confconsole/blob/master/plugins.d/Lets_Encrypt/dns_01.py#L44C27-L44C27

That's not the main reason I'm chiming in, though; just trying to get some clarity about the flow. As I understand it, the lexicon config generated by the confconsole plugin is, in fact, a global config; so it's on the user to add the correct plugin configuration. For example, since I use CF as my DNS provider, my config looks like:

cloudflare:
  auth_token: AUTH_TOKEN

I don't see anywhere in the code that would rename the file to include the name of the DNS provider before it is passed to lexicon-dns. I bring this up because it can cause challenges to be issued by Let's Encrypt that lexicon may not know how to pass the answer through to the DNS servers.

EDIT: actually, now that I look at it, the lexicon.yml file is never copied out of the /usr/share/confconsole/letsencrypt folder, it is only referenced by exporting the $SHARE folder as the value of LEXICON_CONFIG_DIR, which Lexicon uses to find the updated config. It would be nice if there was an easy way to run a staging request, rather than a main-server one, from the plugin, to help iron out any difficulties before trying to get the proper certs.

EDIT 2: at this point, I'm working through some testing on my current setup, and taking some notes as I go.

zndrr commented 12 months ago

@JedMeister Quick note to say that cert issuance was successful in part through the UI (using unscoped token). Will collate notes and edit post, but quick summary is I still had to install Lexicon outside of confconsole as a pre-req.

Confirmed cert okay with webmin by changing the keyfile line in /etc/webmin/miniserv.conf and restarting its' process. keyfile=/var/lib/dehydrated/certs/sub.domain.nz/cert.pem Edit: In light of my revelation RE: wildcard certs addendum below, this config edit may not have been necessary.

Per @DragRedSim, I do concur that being able to choose between staging and prod platforms for LE would be an additional cherry. But that can come later I suppose, if at all; ideal for tshoot but not realistically required with a working platform.

Interim EDIT - Dehydrated Wildcard handling

Quick update (not outputs promised) - there is specific dehydrated handling for wildcards. Per https://github.com/dehydrated-io/dehydrated/blob/master/docs/domains_txt.md#wildcards

Exhibit A

[2023-12-02 01:07:37] dehydrated-wrapper: INFO: running dehydrated
# INFO: Using main config file /etc/dehydrated/confconsole.config
Processing *.sub.domain.nz
ERROR: Please define a valid alias for your *.sub.domain.nz wildcard-certificate. See domains.txt-documentation for more details.
[2023-12-02 01:07:37] dehydrated-wrapper: WARNING: Something went wrong, restoring original cert, key and combined files.

Exhibit B tkcore_dns_domainentry Exhibit C

[2023-12-02 01:14:13] dehydrated-wrapper: INFO: started
# INFO: Using main config file /etc/dehydrated/confconsole.config
+ Account already registered!
[2023-12-02 01:14:14] dehydrated-wrapper: INFO: No process found listening on port 443; continuing
[2023-12-02 01:14:14] dehydrated-wrapper: INFO: running dehydrated
# INFO: Using main config file /etc/dehydrated/confconsole.config
Processing *.sub.domain.nz
 + Creating new directory /var/lib/dehydrated/certs/star.sub.domain.nz ...
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for sub.domain.nz
 + 1 pending challenge(s)
 + Deploying challenge tokens...
[2023-12-02 01:14:18] confconsole.hook.sh: INFO: Deploying challenge for sub.domain.nz.
[2023-12-02 01:14:18] confconsole.hook.sh: INFO: Creating a TXT challenge-record with cloudflare.
RESULT
------
True

Exhibit D

┌────────────────────────────┐
│                            │
│ Error in >: Domain may not │
│ have less than 2 segments  │
│                            │
│ Would you like to ignore   │
│ and overwrite data?        │
├────────────────────────────┤
│     < Yes >   < No  >      │
└────────────────────────────┘

EDIT 2 - Process with outputs

Attached are two files:

Hopefully that helps. I feel we're almost there though! Great effort so far

EDIT 3 - Documentation

Looking through again, it seems informative enough to me! Great work. I don't know your audience though and I consider myself at least modestly knowledgeable lol.

EDIT 3 - UI

For the added config in the confconsole dehydrated box, do you think it would be okay to shorten the example placeholder lengths? auth_token: >> YOUR_CF_UNSCOPED_API_TOKEN << For text entry in confconsole, keyboard shortcuts and pasting with newline don't seem to work (understandably) - so it's a modest effort to replace with values, resorting to backspace or delete. modest being the operative word; akin to a nitpick.

I'm not sure you can normalise the auth vars to more global interop ones? eg auth_token auth_username auth_secret etc. If so, could conceivably cut down on hassle for you between providers. Bearing in mind that this is without parsing through any code, so please ignore if it's passed as-is to the inheriting tools.

Again though, I don't know your audience. I wouldn't put a huge deal of stock in to this feedback bias. Nor does it personally detract me from usage.

FYI also - not once have I been frustrated nor considered this painful. Happy to help. I'm also invested in moving some of my containers to TK18 to utilise this feature add instead of my own jank. Waiting patiently for openldap on TK18 too :D

JedMeister commented 12 months ago

@DragRedSim - welcome to the fray! :grin: Another tester is always welcome so thanks for joining in.

The top line (and the line under it) is being added somewhere (pretty sure it's in dns_01.py) and doesn't need to be (so at least one thing that still needs attention). Otherwise, it "just works"! 😁

https://github.com/turnkeylinux/confconsole/blob/master/plugins.d/Lets_Encrypt/dns_01.py#L44C27-L44C27

Thanks, saved me a grep! :+1:

That's not the main reason I'm chiming in, though; just trying to get some clarity about the flow. As I understand it, the lexicon config generated by the confconsole plugin is, in fact, a global config; so it's on the user to add the correct plugin configuration. For example, since I use CF as my DNS provider, my config looks like:

cloudflare:
 auth_token: AUTH_TOKEN```

FWIW my intention was/is to create the provider specific lexicon conf - i.e. lexicon_PROVIDER.yaml e.g. lexicon_cloudflare.yaml. So far only route53 and cloudflare are supported, other providers will just get a generic example conf (which will be renamed to the specific provider). We can add additional provider specific examples as needed.

I don't see anywhere in the code that would rename the file to include the name of the DNS provider before it is passed to lexicon-dns. I bring this up because it can cause challenges to be issued by Let's Encrypt that lexicon may not know how to pass the answer through to the DNS servers.

Just to make sure we're on the same page, the code is in my bookworm-le-dns-fixes branch via this (as yet unmerged) PR.

But now you mention it, I think you are right! I just found the code that I was thinking of, and I can't see where it's doing that either :rofl: Over the weekend (while reflecting - I was miles way from my computer), I realised that I hadn't handled the general case (i.e. if the provider isn't cloudflare or route53). But following your note, I don't think I was handling any provider. Doh!

EDIT: actually, now that I look at it, the lexicon.yml file is never copied out of the /usr/share/confconsole/letsencrypt folder, it is only referenced by exporting the $SHARE folder as the value of LEXICON_CONFIG_DIR, which Lexicon uses to find the updated config.

Yeah, I'll have another look...

It would be nice if there was an easy way to run a staging request, rather than a main-server one, from the plugin, to help iron out any difficulties before trying to get the proper certs.

Whilst I somewhat agree with @zndrr:

I do concur that being able to choose between staging and prod platforms for LE would be an additional cherry. But that can come later I suppose, if at all; ideal for tshoot but not realistically required with a working platform.

FWIW I have been thinking that allowing staging could be a good idea (and that's why there is a staging config in the share dir). My only reservation is that because the resulting cert will still give a security warning, I suspect that some more newbish users (a large slice of our user base) might get confused by that and think that it's failed (even though the "get certificate" will succeed. Having said that, we could note that the cert will still raise warnings, but if the process (of getting the cert) is successful - then rerunning "for real" should "just work".

Bottom line, I think it would be a really nice feature. Still, I don't think that I'll do that just yet as I really need to get back to the v18.0 release ASAP. So getting this working is my priority. Once it works reliably, then I'll probably leave it at that for now.

EDIT 2: at this point, I'm working through some testing on my current setup, and taking some notes as I go.

Thanks. I won't respond directly to your points currently, but thanks for sharing.

JedMeister commented 12 months ago

@zndrr

@JedMeister Quick note to say that cert issuance was successful in part through the UI (using unscoped token). Will collate notes and edit post, but quick summary is I still had to install Lexicon outside of confconsole as a pre-req.

I'm sure that part worked for me, but perhaps there was something I missed? I'll start my day with a clean v18.0 Core container and after firstboot, I'll take a snapshot (and revert to that snapshot before retesting), so I can guarantee a clean, reproducible env. Further thinking, perhaps we might actually be better installing lexicon via pipx - although I'm not 100% sure...

Interim EDIT - Dehydrated Wildcard handling

I won't dig into your specifics for now, but I really appreciate this additional info. I did test a wildcard initially and it failed. So I realised that something more/specific must be required, but instead just focused on getting a "normal" sub domain working first (with intention to circle back to wildcard certs once that was working reliably). Thanks for doing the legwork for me! :grin:

EDIT 2 - Process with outputs

Thanks!

EDIT 3 - Documentation

Thanks for your feedback. It may still require some work, but your feedback suggests that it's "near enough" for now. Users can ask if it still doesn't give enough info - and I'll update accordingly.

EDIT 3 - UI

For the added config in the confconsole dehydrated box, do you think it would be okay to shorten the example placeholder lengths? auth_token: >> YOUR_CF_UNSCOPED_API_TOKEN << For text entry in confconsole, keyboard shortcuts and pasting with newline don't seem to work (understandably) - so it's a modest effort to replace with values, resorting to backspace or delete. modest being the operative word; akin to a nitpick.

Thanks again for your feedback. I'm fine shortening them, although my thought is that we need to balance "ease of use" (in context of physical actions required; typing/deleting/etc) against "ease of understanding" (working out what needs to be done). I suspect for more experienced users, the former is more important whereas newer user would probably require more of the latter... So whilst I'm definitely sympathetic to your "nit pick", I'm not 100% sure of the best path. And as this step will only need to be done once (or once in a while at most), a little extra effort (i.e. having to hit delete/backspace a few more times) here may not be too high a price to pay to make it easier for the rest? I'm unsure, but for now I'm going to just leave it be and focus on getting it working.

Happy to revisit this - especially if you have some concrete suggestions of shorter example values that still convey the info.

I'm not sure you can normalise the auth vars to more global interop ones? eg auth_token auth_username auth_secret etc. If so, could conceivably cut down on hassle for you between providers. Bearing in mind that this is without parsing through any code, so please ignore if it's passed as-is to the inheriting tools.

I didn't look very far, but it seems like there are a huge range of provider keys. As you likely would have noticed, even just Cloudflare has 3 different options - requiring 3 different combos of 3 different keys (auth_token being the only one common to all 3 config options). Then Route53 doesn't use auth_token at all. So it seems giving generic examples is very problematic.

We could parse the docs relevant to each provider to provide a better example dynamically, but that will add complexity (and knowing me, more bugs... :grin:) so whilst it would be cool, it feels like bit of a side track right now. As I noted above, I'm ok with providing 2 good examples (of what I imagine are probably the 2 most popular DNS providers) and a crappy generic one - with further guidance via the docs.

FYI also - not once have I been frustrated nor considered this painful. Happy to help. I'm also invested in moving some of my containers to TK18 to utilise this feature add instead of my own jank. Waiting patiently for openldap on TK18 too :D

Thanks for this. I get quite self conscious about these sort of issues as in my opinion, these sort of "show stopper" bugs shouldn't occur in stable TurnKey releases. I accept that things slip through (and we don't have any automated testing - which would help reduce bugs heaps) but I'm acutely aware of all the bugs (at least those bought to my attention). Beyond the "embarrassment" that TurnKey isn't as good as I tend to think it is (and definitely want it to be), I'm also acutely aware of how few resources we have and that anything we say "yes" to is saying "no" to a million other things we could be doing...

Anyway, thanks for your encouragement and patience. And I'll be sure to add OpenLDAP to the priority queue for the next batch of appliance updates. No promises, but hopefully if we can get this particular issue (i.e. DNS-01 LE cert via Confconsole) under control early this week, I'll be able to get at least a batch of 10 appliances ready to publish early next week. :crossed_fingers:

JedMeister commented 12 months ago

Ok FWIW, I've reproduced the initial install issue and developed a fix. Thanks for your persistence regarding that @zndrr.

Still more to do, but getting there...

Also just a minor point re one of your attached files @zndrr (attempt 1) - apt show PKG will give package details regardless of whether the package is installed or not. To check if it's installed try apt policy PKG instead. E.g. let me demonstrate by running those 2 commands back-to-back on a server I have handy:

root@tkldev ~# apt show lexicon
Package: lexicon
Version: 3.11.7-1
Priority: optional
Section: python
Maintainer: Ana Custura <ana@netstat.org.uk>
Installed-Size: 31.7 kB
Depends: python3:any, python3-lexicon (= 3.11.7-1)
Homepage: https://github.com/AnalogJ/lexicon
Download-Size: 10.9 kB
APT-Sources: http://deb.debian.org/debian bookworm/main amd64 Packages
Description: CLI for manipulating DNS records on various DNS providers (Python 3)
 Lexicon provides a way to manipulate DNS records on multiple DNS
 providers in a standardized way. Lexicon was designed to be used in
 automation, specifically letsencrypt.
 .
 This package installs the tool for Python 3.
root@tkldev ~# apt policy lexicon
lexicon:
  Installed: (none)
  Candidate: 3.11.7-1
  Version table:
     3.11.7-1 500
        500 http://deb.debian.org/debian bookworm/main amd64 Packages

I'll have a new package soon. Hopefully today!?

JedMeister commented 12 months ago

Ok @zndrr & @DragRedSim, I think we're getting closer now...

An install of my latest build (below as per previous uploads) on a clean v18.0 core appears to be working (although as I note below, I haven't yet looked at the wildcard stuff).

Also, hopefully I haven't missed any glaring bugs this time as I've developed a much more rigorous testing/development regime (it's a PITA TBH, but it is more reliable). FWIW I have installed Core v18.0, then run apt upgrade on it. I then took a snapshot of it. Then prior to testing each new build, I rewound to that snapshot first. So I could ensure that I had a clean env (previously I was just manually cleaning up and must have been missing bits).

As hinted, I still haven't addressed the wildcard cert issue you highlight @zndrr, so it's probably still not near enough for your desired usage, but at least the initial setup should now be working. Hopefully I should have time to start looking at the wildcard stuff tomorrow.

Beyond bugfixes, I also added a bit of feedback for the install step (nothing appeared to be happening - even though it was). Although it's just echoing straight to the terminal, so it's a bit ugly. Still I think it's better than nothing.

If there are any questions/points either of you have raised that I haven't explicitly answered/responded to, please feel free to bump me on those. As per always, please share any bugs, issues and/or feedback re improvements are more than welcome.


confconsole_2.1.2+15+g9c7aa60_all.deb.txt

wget https://github.com/turnkeylinux/tracker/files/13555018/confconsole_2.1.2%2B15%2Bg9c7aa60_all.deb.txt
mv confconsole_2.1.2+15+g9c7aa60_all.deb.txt confconsole_2.1.2+15+g9c7aa60_all.deb
apt install ./confconsole_2.1.2+15+g9c7aa60_all.deb
zndrr commented 12 months ago

Bloody marvellous! Seems like it's golden now. At least from a fresh run.

I did the same RE: testing: Provision, Snapshot, Apt update, Snapshot etc etc. You get good after a few runs.

As hinted, I still haven't addressed the wildcard cert issue you highlight @zndrr, so it's probably still not near enough for your desired usage, but at least the initial setup should now be working. Hopefully I should have time to start looking at the wildcard stuff tomorrow.

As for this part, the aliasing does work at face value in confconsole for wildcard, per my Exhibit B above: *.sub.domain.nz > star_sub_domain_nz So the wildcard issuance in UI is not a problem at all. The only problem was my Exhibit D above when you went back through the menus on Get certificate; it didn't handle that well -- interpretation of word number or ">" perhaps? -- I'd say that navigation isn't common though?

I guess you could do a few things for wildcard:

  1. Interpret the asterisk and alias accordingly as I did manually. Guess you'd want to indicate that with feedback or something.
  2. Add some form of hint.
  3. Fail gracefully with some sort of useful error before it hands off to Dehydrated.

I don't know what other scenarios to account for.

Haven't tried host cert vs wildcard, but that worked prior when lexicon was installed via terminal. Nor have I tried a fresh issuance of wildcard with the confconsole glitch... nor multiple domains with the aliases. These things get convoluted fast don't they...

In any case, good work. Turnaround was pretty quick IMO.

PS. This I would certainly personally consider polished enough to use. One issuance and the renew hook; basically set and forget.

JedMeister commented 11 months ago

Bloody marvellous! Seems like it's golden now. At least from a fresh run.

Woohoo! :tada:

As hinted, I still haven't addressed the wildcard cert issue you highlight @zndrr, so it's probably still not near enough for >>your desired usage, but at least the initial setup should now be working. Hopefully I should have time to start looking at >>the wildcard stuff tomorrow.

As for this part, the aliasing does work at face value in confconsole for wildcard, per my Exhibit B above: *.sub.domain.nz > star_sub_domain_nz So the wildcard issuance in UI is not a problem at all. The only problem was my Exhibit D above when you went back through the menus on Get certificate; it didn't handle that well -- interpretation of word number or ">" perhaps? -- I'd say that navigation isn't common though?

Ok, well that's what I get for not reading properly... Doh! Thanks for the clarification.

I guess you could do a few things for wildcard:

  1. Interpret the asterisk and alias accordingly as I did manually. Guess you'd want to indicate that with feedback or something.
  2. Add some form of hint.
  3. Fail gracefully with some sort of useful error before it hands off to Dehydrated.

Good suggestions. My preference would be to parse the line, looking for a '<' and interpret everything after as an alias (FWIW reading the Dehydrated docs I noticed that aliases are legit whether using a wildcard or not). It would actually be pretty easy to check if wildcard domains have a valid alias and if not, we could create one on the fly. Then it would work whether the user just entered a raw wildcard domain (e.g. *.example.com) or wildcard domain with an alias.

I was going to look at that, but I've done a bit more on the TurnKey doc page and added a new wildcard section. I'm thinking for now, that might just do... It's not ideal really and there is lots of room for improvement, but I think that's probably good enough for now. As you may have noticed, I've included a link to the relevant Dehydrated docs on our doc page.

In any case, good work. Turnaround was pretty quick IMO.

Thanks. And thanks too for all your assistance, testing and feedback. It's been a pleasure working with you! :grin:

PS. This I would certainly personally consider polished enough to use. One issuance and the renew hook; basically set and forget.

Great feedback thanks.

Following your assessment and my browsing of the code changes that I've done since your OP, I think that you're right and we're close enough to call that enough for now. There are improvements to be made and rough edges that could be smoothed, but the basic intended functionality is there, with limited chance of user's hitting a stacktrace. Plus, I really do need to get back to the release (OpenLDAP in particular :wink: ).

So I've pushed everything I've got (mostly what you'[ve already tested, with some minor tweaks and tidying). I'll also ask @OnGle to do a code review ASAP too.

Assuming that all goes well, I'll merge my PR and rebuild for the apt repo and upload to the apt repo. Then we should be all good for now. :grin:

Thanks again.

zndrr commented 11 months ago

Good suggestions. My preference would be to parse the line, looking for a '<' and interpret everything after as an alias (FWIW reading the Dehydrated docs I noticed that aliases are legit whether using a wildcard or not). It would actually be pretty easy to check if wildcard domains have a valid alias and if not, we could create one on the fly. Then it would work whether the user just entered a raw wildcard domain (e.g. *.example.com) or wildcard domain with an alias.

That is quite a rational approach, so I'm absolutely happy with that. Good find RE: aliasing for standard host issues; it does seem like more of a determinant of the dir/file naming convention. Can't wait.

Just one quick correction in case you haven't caught it; the aliasing uses the greater than '>' rather than lesser than '<' EDIT: Disregard, your documentation has the correct symbol.

JedMeister commented 11 months ago

FWIW last thing Friday I did a little more testing and discovered that if we want the alias to "stick" it does actually need to be processed by confconsole. :cry:

Whilst it appeared to save the alias ok, on subsequent loads it seems that the current domains handling was silently stripping the alias. So it seems to save ok the first time, but when reloaded, the alias gets stripped (and will be saved like that if you complete the process).

So that suggests 2 things. Firstly, that confconsole's processing/validation of the domains may not be happening at exactly the right time(s). And secondly, that we need to actually handle the alias.

I made a start on Friday, but ran out of day.

I think the best way to go is to make it explicitly handle aliases. I started going down the track of allowing that to be explicitly noted in Confconsole, but after thinking a bit more, I think that the easiest and most reliable way to go is to just auto generate an alias. That may be a little annoying for more experienced users who want a specific alias. But I imagine for most, it will be irrelevant - so long as it "just works". And it already notes that it's only trying to account for fairly straight forward, common config - for more advanced config, additional manual work be done and the dehydrated wrapper script used directly.

So it seems I'm not yet done :(

I think I got close on Friday, but I'll need to revisit. I'll circle back to this ASAP, but I do have a couple of high priority tasks I need to take care of.

JedMeister commented 11 months ago

Ok, so finally got back to this and I'm pretty sure that it's good to go. I'll attach here just in case you want to test it out (I'm waiting on a final code review from a colleague before I merge, rebuild and push to the bookworm repo).

Also @zndrr, IIRC you wanted a LXC OpenLDAP container, right? The build code has been updated, but I'm not 100% sure if I'll get the next batch of appliance done this year or not? Regardless, I'll see what I can manage for you. Would a v18.0 "pre-release" build be acceptable? (If so, I'll build you a container and upload it somewhere).


confconsole_2.1.2+33+g063631e_all.deb.txt

wget https://github.com/turnkeylinux/tracker/files/13555018/confconsole_2.1.2%2B33%2Bg063631e_all.deb.txt)
mv confconsole_2.1.2+33+g063631e_all.deb.txt confconsole_2.1.2+33+g063631e_all.deb
apt install ./confconsole_2.1.2+33+g063631e_all.deb
zndrr commented 11 months ago

@JedMeister Thanks for the offer RE: OpenLDAP, but just having it on your agenda is good enough for me. I am happy to wait for the release version since migrations can be a PITA.

Also great work on resolving this. Haven't yet tested, but am optimistic given the last time I checked. :)

JedMeister commented 11 months ago

Ok, so the fixed package (v2.1.3) for v18.0 has been built and uploaded. It can now be installed via apt (see output from a local TKLDev below).

Yeah it should be good. I've had 2 code reviews from colleagues and I have addressed most of the issues they raised (there was one legacy issue/suggestion I'm going to hold off on, but have added it to the tracker for future improvement).

So I'm felling quite confident that it's now fully functional.

Re OpenLDAP, ok - no worries. It builds successfully and passes the smoke tests, so fingers crossed that should also be good to go. I'm still hoping that I might get a batch of apps ready for release by this weekend, but TBH, it's going to be a sprint and I'm not completely confident that I have enough week left. If I don't get it done by knock off time tomorrow, it won't be until early next year.


root@tkldev ~# apt update
Get:1 http://security.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Hit:2 http://deb.debian.org/debian bookworm InRelease                      
Ign:3 http://archive.turnkeylinux.org/debian bookworm-security InRelease
Get:4 http://security.debian.org/debian-security bookworm-security/main amd64 Packages [128 kB]
Ign:5 http://archive.turnkeylinux.org/debian bookworm InRelease
Hit:6 http://archive.turnkeylinux.org/debian bookworm-security Release
Get:8 http://archive.turnkeylinux.org/debian bookworm Release [5659 B]
Get:9 http://archive.turnkeylinux.org/debian bookworm Release.gpg [833 B]
Get:10 http://archive.turnkeylinux.org/debian bookworm/main amd64 Packages [36.8 kB]
Fetched 219 kB in 3s (86.5 kB/s)   
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
60 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@tkldev ~# apt policy confconsole
confconsole:
  Installed: 2.1.1
  Candidate: 2.1.3
  Version table:
     2.1.3 999
        999 http://archive.turnkeylinux.org/debian bookworm/main amd64 Packages
 *** 2.1.1 100
        100 /var/lib/dpkg/status
root@tkldev ~# apt install confconsole
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Recommended packages:
  di-live
The following packages will be upgraded:
  confconsole
1 upgraded, 0 newly installed, 0 to remove and 59 not upgraded.
Need to get 267 kB of archives.
After this operation, 13.3 kB of additional disk space will be used.
Get:1 http://archive.turnkeylinux.org/debian bookworm/main amd64 confconsole all 2.1.3 [267 kB]
Fetched 267 kB in 3s (103 kB/s)       
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 60928 files and directories currently installed.)
Preparing to unpack .../confconsole_2.1.3_all.deb ...
Unpacking confconsole (2.1.3) over (2.1.1) ...
Setting up confconsole (2.1.3) ...
add-water.service is a disabled or a static unit not running, not starting it.
Enumerating objects: 1347, done.
Counting objects: 100% (1347/1347), done.
Delta compression using up to 2 threads
Compressing objects: 100% (816/816), done.
Writing objects: 100% (1347/1347), done.
Total 1347 (delta 77), reused 1347 (delta 77), pack-reused 0
JedMeister commented 10 months ago

@zndrr - Just a quick follow up to let you know that OpenLDAP v18.0 has been published (or via Proxmox UI if you're wanting a LXC container). I haven't announced it yet, but it should be read to go. Please don't hesitate to open a new issue if you have any problems with it.

zndrr commented 10 months ago

@zndrr - Just a quick follow up to let you know that OpenLDAP v18.0 has been published (or via Proxmox UI if you're wanting a LXC container). I haven't announced it yet, but it should be read to go. Please don't hesitate to open a new issue if you have any problems with it.

Thanks for the heads-up! I've grabbed the LXC image and will probably look to migrate the old over the next few weeks. Happy to raise any issues if they pop up.

EDIT: Just noticed a fresh issue, #1895 quoted just above. I'll endeavour to be vigilant in my DNS procedure when I get around to spinning up the fresh OpenLDAP server and contribute any findings if issues occur.

zndrr commented 10 months ago

@JedMeister heya, same outcome with the failures.

I didn't do any scientific discovery, but a combination of throw-things-at-the-wall amounted in success:

A) Performing this /usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper -r -c dns-01 -p cloudflare

Installed:
1- run dehydrated-wrapped, complained about venv
1- python3-venv, complained about bind (dnsutils)
2- dnsutils; complained about config (registering example.com)
3- manually created /etc/dehydrated/lexicon_cloudflare.yml and edited /etc/dehydrated/{confconsole.config|confconsole.domains.txt} ; complained about LEXICON_CONFIG_DIR var not set (makes sense)

Reflecting on 3-4, likely an issue of orderly steps to generate config (eg via confconsole)

B) performed these steps, per https://www.turnkeylinux.org/comment/56166

rm -rf /usr/local/src/venv/lexicon
mkdir -p /usr/local/src/venv
python3 -m venv /usr/local/src/venv/lexicon
/usr/local/src/venv/lexicon/bin/pip install dns-lexicon[full]
ln -s /usr/local/src/venv/lexicon/bin/lexicon /usr/local/bin/lexicon
ln -s /usr/local/src/venv/lexicon/bin/tldextract /usr/local/bin/tldextract

Post this step, I ran through the confconsole okay.

With the first round of manual steps there (A), I'm not confident it made a difference once I attempted (B). I backed up my LXC after the fresh deployment and package upgrades. Lemme try the second lot of steps from there.

PS. Which issue do you want this discovery to resume in; this one or the newer one?

JedMeister commented 10 months ago

Thanks for confirming @zndrr, and providing some more info. What a PITA!

The thing that frustrates me the most is that I actually explicitly tested it in the new OpenLDAP build just prior to pushing it to the mirror. As I'm sure I've posted somewhere already, my suspicion is a race condition somewhere. I tested in a KVM VM and my server is low power one with relatively low clock speed, so my guess is that with the lower overhead of LXC and a faster CPU things happen a bit faster - raising this issue. Something like that anyway...

I'll dig into this again early next week. I'll try with the LXC build on something with a bit more grunt and hopefully I can reproduce it - if I can reproduce then I can fix it!

PS. Which issue do you want this discovery to resume in; this one or the newer one?

Doesn't really matter. I'll see your posts either way.

JedMeister commented 10 months ago

Ok @zndrr. I'm pretty sure I've found the issue (I'm really sure - but I'm also aware I've said that before). TBH, I'm still not sure why I didn't hit this when I was testing. The only thing I can think of is that I somehow tested the wrong server? I.e. I tested on a system that had python3-venv installed already. TBH this whole debacle is quite embarrassing, but it is what it is...

FWIW the specific commit that fixes this is https://github.com/turnkeylinux/confconsole/commit/04996b2c41a481b664d9f51cf06438031703f24d. In case it's not obvious, the issue was that I was checking to see if python3-venv was installed using the command dpkg-query -W python3-venv. That command does indeed note whether the package is installed or not (and which version it is if installed) - in the text of the response. But I wasn't parsing the text, I was just checking the return code and somehow it escaped me that that particular command always returns a zero exit code! :man_facepalming: Checking the status of the package via dpkg -s python3-venv does what I had intended...

zndrr commented 10 months ago

Heya @JedMeister apologies, got a bit bogged down and forgot to reply earlier.

Immediately following my last post, I couldn't get attempt 1 and 2 working in isolation; so was definitely a combination of the two. The venv presence from attempt 1 might have been key though, as you point out. Order of steps mattered.

But hey, I'm the same. Instead of thoroughly documenting every hiccup, I just did what the computer said and carried on lol. Doesn't help much in these scenarios sorry.

I'll try take another stab this weekend. If you don't publish anything, I'll just retroupdate from that commit or manually edit the file if I'm superass lazy.

Cheers; transparent as always :)

JedMeister commented 10 months ago

All good @zndrr :grin:

In fear of a repeat embarrassment, I did a bit more testing today and caught a few other minor issues I'd previously missed. And I did some more tweaks with an aim to handle the previous failure as best I can. I can't account for every possibility, but I'd like to manage the previous failure. IMO if someone has tried the most recent version (which fails) then they should be able to install the new version and it should "just work" (without any manual intervention required).

I'm normally knocked off by now, but this week has been a mess so I'm planning to do a final bit of testing tonight so I can get a final code review from a colleague tomorrow. :crossed_fingers:

zndrr commented 9 months ago

@JedMeister don't sweat it man. Appreciate the persistence and transparency (think I said this prior).

I also apologise for not being overly responsive; just been super duper busy. Though that also means no time to migrate stuff from TK17 to TK18; silver lining I suppose.

If you need me to test anything and have some detailed steps, then would be happy to put some time aside when able. Proactive discovery/troubleshooting I unfortunately can't quite commit to atm.

JedMeister commented 9 months ago

Thanks mate. I actually pushed to the apt repo late on Friday arvo (I thought I'd posted to note that, but apparently not...).

I'm fairly sure that I got it this time... But would appreciate your feedback/confirmation. BTW, the new version is v2.1.4. I.e. (from an older v18.0 app that obviously had v2.1.2 - the original broken version):

confconsole:
  Installed: 2.1.2
  Candidate: 2.1.4
  Version table:
     2.1.4 999
        999 http://archive.turnkeylinux.org/debian bookworm/main amd64 Packages
 *** 2.1.2 100
        100 /var/lib/dpkg/status
zndrr commented 9 months ago

Can confirm that works. Fresh openldap18 LXC with apt packages up-to-date, incl. confconsole 2.1.4. This was entirely through confconsole.

Might be some UX polish desired, but that is just FYI - IMO takes a backseat to function and I personally don't care since it's more or less a one and done affair.

JedMeister commented 9 months ago

Thanks @zndrr - glad we finally got there...

Re UX polish, if you have specific suggestions, please feel free to open a fresh issue to note your thoughts. Although I doubt I'll be doing anything with it at least until we've finished the v18.0 release (it's really dragging, but we're getting there...).

icf20 commented 5 months ago

i got this on nextcloud LXC

apt show confconsole
Package: confconsole
Version: 2.1.5
Priority: optional
JedMeister commented 5 months ago

Hi @icf20 can you please confirm exactly what message/error you are seeing and at which step?

TBH, I doubt that it was this exact same issue - although if it threw a stacktrace, then on face value, it may have looked very similar.

Also if you got the to DNS provider list, what DNS provider are you using? Are you using Cloudflare or AWS Route53? If not, then did you make sure that you provided all the specific info that your DNS provider requires? FYI we only provide specific example config for Cloudflare & Route53. If you aren't using one of those, you'll just get generic config info that needs to be updated to whatever is specifically required.

If you didn't get that far or it's not that, then perhaps it's something specific to Nextcloud? I don't have a Nextcloud server handy, but if you can give me a bit more info, then I'll fire one up and see if I can reproduce the issue you hit.

FWIW I've just tested a local gitea server that had not been configured before and every step was successful, it installed lexicon into the venv ok, then after adding my AWS Route53 details it "just worked":

[2024-06-11 03:01:08] dehydrated-wrapper: INFO: started
[2024-06-11 03:01:08] dehydrated-wrapper: WARNING: /etc/cron.daily/confconsole-dehydrated not found; copying default from /usr/share/confconsole/letsencrypt/dehydrated-confconsole.cron
# INFO: Using main config file /etc/dehydrated/confconsole.config
+ Generating account key...
+ Registering account key with ACME server...
+ Fetching account URL...
+ Done!
[2024-06-11 03:01:22] dehydrated-wrapper: INFO: found nginx listening on port 443
[2024-06-11 03:01:22] dehydrated-wrapper: INFO: running dehydrated
# INFO: Using main config file /etc/dehydrated/confconsole.config
 + Creating chain cache directory /var/lib/dehydrated/chains
Processing git.jeremydavis.org
 + Creating new directory /var/lib/dehydrated/certs/git.jeremydavis.org ...
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for git.jeremydavis.org
 + 1 pending challenge(s)
 + Deploying challenge tokens...
[2024-06-11 03:01:31] confconsole.hook.sh: INFO: Deploying challenge for git.jeremydavis.org.
[2024-06-11 03:01:31] confconsole.hook.sh: INFO: Creating a TXT challenge-record with route53.
RESULT
------
True
 + Responding to challenge for git.jeremydavis.org authorization...
 + Challenge is valid!
 + Cleaning challenge tokens...
[2024-06-11 03:02:12] confconsole.hook.sh: INFO: Clean challenge for git.jeremydavis.org.
RESULT
------
True
 + Requesting certificate...
 + Checking certificate...
 + Done!
 + Creating fullchain.pem...
[2024-06-11 03:02:21] confconsole.hook.sh: SUCCESS: Cert request successful. Writing relevant files for git.jeremydavis.org.
[2024-06-11 03:02:21] confconsole.hook.sh: INFO: fullchain: /var/lib/dehydrated/certs/git.jeremydavis.org/fullchain.pem
[2024-06-11 03:02:21] confconsole.hook.sh: INFO: keyfile: /var/lib/dehydrated/certs/git.jeremydavis.org/privkey.pem
[2024-06-11 03:02:21] confconsole.hook.sh: SUCCESS: Files written/created for git.jeremydavis.org: /usr/local/share/ca-certificates/cert.crt - /etc/ssl/private/cert.key - /etc/ssl/private/cert.pem.
 + Done!
[2024-06-11 03:02:21] dehydrated-wrapper: INFO: dehydrated complete
[2024-06-11 03:02:21] dehydrated-wrapper: INFO: Cleaning backup cert & key
[2024-06-11 03:02:21] dehydrated-wrapper: INFO: (Re)starting nginx
[2024-06-11 03:02:21] dehydrated-wrapper: INFO: (Re)starting webmin.service
[2024-06-11 03:02:26] dehydrated-wrapper: INFO: dehydrated-wrapper completed successfully.

Regardless of what the issue you are hitting is, throwing a stacktrace is a bug. It something has gone wrong, then it should give a meaning error message! If that's the case though, we'll open a fresh issue specifically for that.