samba-in-kubernetes / samba-container

Build Samba Container Images / Kubernetes & Container Runtime Example Files
GNU General Public License v3.0
48 stars 18 forks source link

Tests suites regularly failing: test-ad-server-kubernetes on default,opensuse,amd64 #157

Closed phlogistonjohn closed 1 year ago

phlogistonjohn commented 1 year ago

Over the last week or so the test suite has been failing with regularity. One consistent failure case is the suite test-ad-server-kubernetes running with the build parameters default, opensuse, amd64.

I kicked off a rerun recently and the same failure is exhibited. This needs investigation.

Examples: https://github.com/samba-in-kubernetes/samba-container/actions/runs/6438421335 https://github.com/samba-in-kubernetes/samba-container/actions/runs/6444959862 https://github.com/samba-in-kubernetes/samba-container/actions/runs/6477174091 https://github.com/samba-in-kubernetes/samba-container/actions/runs/6490445394 https://github.com/samba-in-kubernetes/samba-container/actions/runs/6521431790

phlogistonjohn commented 1 year ago

Ran the container locally (with: podman -r run --privileged --rm -it -v samba2:/var/lib/samba:z samba-ad-server:default-opensuse-amd64 ) and saw the following:

Repacking database from v1 to v2 format (first record DC=DomainDnsZones,DC=domain1.sink.test,CN=MicrosoftDNS,DC=DomainDnsZones,DC=domain1,DC=sink,DC=test)
Repacking database from v1 to v2 format (first record DC=6b60168a-6657-4f9c-8d40-636bfc779969,DC=_msdcs.domain1.sink.test,CN=MicrosoftDNS,DC=ForestDnsZones,DC=domain1,DC=sink,DC=test)
INFO 2023-10-23 18:16:33,391 pid:2 /usr/lib64/python3.11/site-packages/samba/provision/__init__.py #2027: Setting up sam.ldb rootDSE marking as synchronized
INFO 2023-10-23 18:16:33,398 pid:2 /usr/lib64/python3.11/site-packages/samba/provision/__init__.py #2032: Fixing provision GUIDs
Temporarily overriding 'dsdb:schema update allowed' setting
ERROR(<class 'ModuleNotFoundError'>): uncaught exception - No module named 'markdown'
  File "/usr/lib64/python3.11/site-packages/samba/netcmd/__init__.py", line 279, in _run
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/samba/netcmd/domain/provision.py", line 343, in run
    result = provision(self.logger,
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/samba/provision/__init__.py", line 2399, in provision
    raise e
  File "/usr/lib64/python3.11/site-packages/samba/provision/__init__.py", line 2389, in provision
    forest = ForestUpdate(samdb, fix=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/samba/forest_update.py", line 212, in __init__
    from samba.ms_forest_updates_markdown import read_ms_markdown
  File "/usr/lib64/python3.11/site-packages/samba/ms_forest_updates_markdown.py", line 27, in <module>
    import markdown
Traceback (most recent call last):
  File "/usr/bin/samba-dc-container", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.11/site-packages/sambacc/commands/dcmain.py", line 49, in main
    cfunc(CommandContext(cli))
  File "/usr/lib/python3.11/site-packages/sambacc/commands/addc.py", line 162, in run
    _prep_provision(ctx)
  File "/usr/lib/python3.11/site-packages/sambacc/commands/addc.py", line 84, in _prep_provision
    addc.provision(
  File "/usr/lib/python3.11/site-packages/sambacc/addc.py", line 39, in provision
    subprocess.check_call(
  File "/usr/lib64/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['samba-tool', 'domain', 'provision', '--option=netbios name=dc1', '--use-rfc2307', '--dns-backend=SAMBA_INTERNAL', '--server-role=dc', '--realm=DOMAIN1.SINK.TEST', '--domain=DOMAIN1', '--adminpass=Passw0rd']' returned non-zero exit status 255.
phlogistonjohn commented 1 year ago

samba-ad-server:default-opensuse-amd64:

1f9b415950b5:/ # rpm -q samba-ad-dc
samba-ad-dc-4.19.2+git.322.7e9201cef5-1.1.x86_64

For comparison: samba-ad-server:default-fedora-amd64 functions correctly with:

[root@92cfef8f3e76 /]# rpm -q samba-dc
samba-dc-4.18.8-1.fc38.x86_64

For comparison: samba-ad-server:nightly-fedora-amd64 functions correctly with:

[root@d03a41925b57 /]# rpm -q samba-dc
samba-dc-20231019.223919.4c291514a9e-1.fc38.x86_64
[root@d03a41925b57 /]# samba-tool -V
samba-tool: missing subcommand

4.20.0pre1-UNKNOWN
[root@d03a41925b57 /]# python3 -c 'import markdown'

[root@d03a41925b57 /]# rpm -q --whatrequires python3-markdown
python3-samba-dc-20231019.223919.4c291514a9e-1.fc38.x86_64

So it's possible the opensuse package has a bug where it should require the python3-markdown equivalent.

@dmulder, should I file this as a bug somewhere at opensuse? If so where? In the meantime we may be able to work around it by updating the containerfile. I'll look into this later.

dmulder commented 1 year ago

Ah, I'll fix it. Thanks for pointing this out.

dmulder commented 1 year ago

https://bugzilla.suse.com/show_bug.cgi?id=1216519

phlogistonjohn commented 1 year ago

I attempted to implement a workaround by installing python3-Markdown explicitly. Unfortunately, the samba command fails to start correctly and I'm not clear on how to debug it. Initially, I thought it was ignoring the -F and/or -i options to keep the parent process from forking... but it continued to simply exit afterwards.

I compared this to the nightly build of samba in the fedora container (using 4.20.0pre1-UNKNOWN) and that one runs as expected.

I found the following in /var/log/samba/log.samba:

[2023/11/02 18:27:55.671998,  0] ../../source4/samba/server.c:633(binary_smbd_main)
  samba version 4.19.2-git.322.7e9201cef5SUSE-oS16.9-x86_64 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2023
[2023/11/02 18:27:55.672294,  0] ../../lib/util/become_daemon.c:150(daemon_status)
  daemon_status: daemon 'samba' : Starting process...
[2023/11/02 18:27:55.777097,  0] ../../source4/samba/server.c:908(binary_smbd_main)
  binary_smbd_main: samba: using 'prefork' process model
[2023/11/02 18:27:55.857622,  0] ../../source4/lib/tls/tlscert.c:67(tls_cert_generate)
  Attempting to autogenerate TLS self-signed keys for https for hostname 'DC1.domain1.sink
.test'
: /usr/sbin/krb5kdc: /usr/sbin/krb5kdc: symbol lookup error: /usr/sbin/krb5kdc: undefined 
symbol: k5_buf_cstring, version krb5support_0_MIT
: The MIT KDC daemon died with exit status 127
: task_server_terminate: task_server_terminate: [mitkdc child process exited]
[2023/11/02 18:27:56.001567,  0] ../../source4/samba/server.c:403(samba_terminate)
  samba_terminate: samba_terminate of samba 26: mitkdc child process exited
[2023/11/02 18:27:56.647107,  0] ../../source4/lib/tls/tlscert.c:154(tls_cert_generate)
  TLS self-signed keys generated OK

Because we gate pushing builds to quay.io behind all test jobs succeeding we haven't pushed an update in a month - Oct. 4 was the last sucessful CI run. I'm sorely tempted to temporarily disable opensuse ad dc builds in our CI until we can work through this issue.

Any thoughts? CC: @dmulder @obnoxxx

dmulder commented 1 year ago

The markdown fix is on it's way in. I'm fine with you disabling the tests for now. I'll look into this other issue.

phlogistonjohn commented 1 year ago

Argh. Some days I feel very silly. Immediately after I posted that I realized that the base image I used was cached from some time ago. I forced the podman build to use a new base image (it's a little tricky because I have to pull docker.io/opensuse/tumbleweed and then manually alias that to opensuse/tumbleweed)

With the updated base image, the samba server starts normally. :-o

phlogistonjohn commented 1 year ago

Base image caching always bites me when I'm testing locally.

dmulder commented 1 year ago

Oh, awesome. Ok. Sorry about the delay getting the markdown fix into our image.

phlogistonjohn commented 1 year ago

I created #158 to prove to myself my earlier debugging was misguided. Happily the CI succeeded with python3-Markdown added to the packages list. I'm OK with waiting a few more days for the fix to land in the opensuse package repo(s), but if you think it'll take more than a week I'll probably clean up and try to merge the workaround and then we can just remove the workaround later once the package is fixed. Sound OK?

dmulder commented 1 year ago

I'll leave it up to you whether you merge the temporary fix. It won't hurt anything to have it in there. The fix is in testing right now, and will probably merge in a few days.

phlogistonjohn commented 1 year ago

The CI suite has run successfully on Nov 4th & 5th. Thanks so much for working on this, thanks!