nashcom / domino-startscript

Domino Start Script
https://nashcom.github.io/domino-startscript/
Apache License 2.0
21 stars 4 forks source link

Not using config in working directory #11

Closed rowlk closed 2 months ago

rowlk commented 2 months ago

Been using the container script for some time but decided I need to move some services to new Rocky 9 hosts. All simple stuff cloned the latest start scripts ran the install copied over the .dominoctl directory and attached the persistent volumes. But when initialising the new containers whilst the console reports reading the config from the working directory/.dominoctl the containers actually get configured from etc/sysconfig. Strangely if I then try and run, say, dominoctl console it does read the local files and errors as the container doesn't exist....

Hopefully there is an obvious explanation here and I have just missed a step along the way.

Daniel-Nashed commented 2 months ago

the order of checks for configuration is:

I don't know where you put the files and what you are exactly expecting.

the order described above has some background.

by default with no config you are using etc/sysconfig it's only writable by root and that can be good or bad. if you are running with root and you only have one instance, this is a good config.

it sounds like you are using the local configuration in the current directory. this should work if you have the right owner and the user can read it. without more detailed information this write up is all I can do.

maybe this already gives you an idea what to check next?

rowlk commented 2 months ago

All files are in the correct place. As we run multiple domino servers per host machine we have always used the local configuration.

I reverted to the old script (Version 1.1.0 04.06.2022) which works. Can't see anything obvious in the script though. Both correctly identify the required configuration path (the new includes the full path whereas the old is the relative path but that's not significant). However the new then loads from ~/.dominoctl if present or if not then /etc/sysconfig ignoring the local config.

Note that "dominoctl config" does load the correct domino_container config.

Daniel-Nashed commented 2 months ago

There are no changes I am aware of. I just have tested a local config and it works for me as expected.

Also the cfg and the run use the same way to find the configuration. The used configuration is also printed when the script runs.

I can't help you if you only describe what you think what is happening without any traces or concrete information makes it difficult to help. I can't help much based on assumptions.

You could add tracing to the script to see if and where it gets the wrong config. But you should see it already from the output of the script as shown below.

Version Info showing also the configured directory

dominoctl version

-----------------------------------------
/local/test/.dominoctl/domino_container

dominoctl
---------
Nash!Com Domino Container Script Version 1.4.0
(Running on docker Version 27.0.3)

Stop the existing container, create one with new config

dominoctl stop

-----------------------------------------
/etc/sysconfig/domino_container
/etc/sysconfig/env_container

dominoctl init local

dominoctl start

-----------------------------------------
/local/test/.dominoctl/domino_container

8c2444c927d9ce45be025d97b3a1ee1305d39635f68b721024e6dd29fc1a298c
rowlk commented 2 months ago

Sorry I thought I had explained. Let me try again.

This container was created using dominctl start and script Version 1.1.0 04.06.2022 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 437eb63eca96 localhost/hclcom/domino:latest About an hour ago Up About an hour (healthy) domino-wvh

This correctly represents the desired configuration per the config in the local .dominoctl

Now with script version Version 1.4.0 20.12.2023 and without changing the working directory we can run the same command

dominoctl start


/local/tmx-hs-wvh-a/.dominoctl/domino_container /local/tmx-hs-wvh-a/.dominoctl/env_container

Starting systemd domino_container.service

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 437eb63eca96 localhost/hclcom/domino:latest About an hour ago Up About an hour (healthy) domino-wvh e6312bf2f373 localhost/hclcom/domino:latest 5 minutes ago Up 5 minutes (healthy) home-domino-wvh

So even though we have initiated from the working directory AND the script identifies the correct config a new container has been created based on the config in the ~/.dominoctl directory (identified by container name for testing)

We can demonstrate in more detail here. In the following two cases we will run commands from the working directory, the first with the new script in place and the second with the old script:

==================== New Script# dominoctl version


/local/tmx-hs-wvh-a/.dominoctl/domino_container /local/tmx-hs-wvh-a/.dominoctl/env_container

dominoctl

Nash!Com Domino Container Script Version 1.4.0 (Running on podman Version 4.9.4-rhel)

TMX-ROCKY-9:/local/tmx-hs-wvh-a

dominoctl inspect


/local/tmx-hs-wvh-a/.dominoctl/domino_container /local/tmx-hs-wvh-a/.dominoctl/env_container


Runtime : podman 4.9.4-rhel Status : running Health : Started : 17.08.2024 14:08:59

Name : home-domino-wvh Current Image : localhost/hclcom/domino:latest New Image : hclcom/domino:latest Version : 14.0FP1 Domino Ver : 14.0FP1 Add-ons : verse=3.2.2 BuildTime : 16.08.2024 14:37:36

Hostname : tmx-hs-wvh-a.tormatrix.com Volumes : /local/tmx-hs-wvh-a/local Mounts : /local


Container ID : e6312bf2f373 Image-ID : f759351414d6

Image Size : 1596 MB NetworkMode : host Driver : overlay Restart Policy : on-failure:3

================== Repeat with old script# dominoctl version


.dominoctl/domino_container .dominoctl/env_container

dominoctl

Nash!Com Domino Container Script Version 1.2.0 (Running on Podman Version 4.9.4-rhel)

TMX-ROCKY-9:/local/tmx-hs-wvh-a

dominoctl inspect


.dominoctl/domino_container .dominoctl/env_container


Runtime : Podman 4.9.4-rhel Status : running Health : Started : 17.08.2024 12:59:13

Name : domino-wvh Current Image : localhost/hclcom/domino:latest New Image : hclcom/domino:latest Version : 14.0FP1 Domino Ver : BuildTime : 16.08.2024 14:37:36

Hostname : tmx-hs-wvh-a.tormatrix.com Volumes : /local/tmx-hs-wvh-a/local Mounts : /local


Container ID : "437eb63eca Image-ID : f759351414d6

Image Size : 1596 MB NetworkMode : host Driver : overlay Restart Policy : on-failure:3

Daniel-Nashed commented 2 months ago

We are missing something here in the information about your environment.

I see this line in your log:

´´´ Starting systemd domino_container.service ´´´

This tells me that the container script is using the Podman service instead of Docker. Could it be that you have Docker and Podman installed?

In this case Podman is preferred unless you configure dominoctl to use Docker. If both are installed it always uses Podman by default.

Are you using Podman on purpose or do you have also Docker installed?

Can you run the about command? It would detect both environments:

dominoctl about

-----------------------------------------
/etc/sysconfig/domino_container
/etc/sysconfig/env_container

------------------------------------------------------------------------------------------
Hostname      :      volt.domino-lab.net
Linux OS      :      CentOS Stream 9
Linux Version :      9
Kernel        :      5.14.0-457.el9.x86_64
Virt          :      kvm
Podman        :      5.1.1
Docker        :      27.0.3
DomDownload   :      1.0.4

I would assume you use Docker, because running Podman with multiple containers is more complicated. You would need a systemd service per container because Podman isn't running thru a daemon.

In contrast Docker is a systemd started application which can start multiple containers when it starts up.

Please clarify which type of configuration you are using. If you are running multiple containers Podman is much more complicated to setup. I never used Podman to run multiple containers. It could be done, but needs some manual configuration crafting separate systemd files.

I would recommend using Docker for a multi container environment. Maybe you have not done that on purpose?

I re-read everything you wrote and I don't se you mentioned Docker nor Podman. But your trace shows Podman.

Please provide more information if you are using a complex configuration with multiple containers! This would make it a lot easier for me to help you ;-)

rowlk commented 2 months ago

I am running Podman and I take on board your recommendation to use Docker. That said we have had multi container podman servers running for quite some time without issue. We do not have Docker installed on this host.

However the multi container vs single container discussion does not impact the observed behaviour of this script version which can be reproduced when starting a single container. For more evidence I deliberately set an invalid image in the home config. With no containers running we run start from a newly initialised working directory with resulting failure which allows us to view the logs and prove beyond doubt that the script whilst claiming to read the local config isn't:

dominoctl start

Info: Cannot read configured environment file [env_container]!


/local/test/.dominoctl/domino_container

Starting systemd domino_container.service Job for domino_container.service failed because the service did not take the steps required by its unit configuration. See "systemctl status domino_container.service" and "journalctl -xeu domino_container.service" for details.

Aug 17 15:19:37 TMX-ROCKY-9.4 systemd[1]: Starting HCL Domino Server on Podman (notes)... ░░ Subject: A start job for unit domino_container.service has begun execution ░░ Defined-By: systemd ░░ Support: https://wiki.rockylinux.org/rocky/support ░░ ░░ A start job for unit domino_container.service has begun execution. ░░ ░░ The job identifier is 53140. Aug 17 15:19:37 TMX-ROCKY-9.4 domino_container[59139]: /root/.dominoctl/domino_container: line 1: nosuch: command not found Aug 17 15:19:37 TMX-ROCKY-9.4 domino_container[59140]: /root/.dominoctl/domino_container: line 2: i: command not found Aug 17 15:19:37 TMX-ROCKY-9.4 domino_container[59133]: ----------------------------------------- Aug 17 15:19:37 TMX-ROCKY-9.4 domino_container[59133]: /root/.dominoctl/domino_container Aug 17 15:19:37 TMX-ROCKY-9.4 domino_container[59133]: /root/.dominoctl/env_container Aug 17 15:19:38 TMX-ROCKY-9.4 domino_container[59133]: Image [hclcom/domino:nosuch] not found. Trying to pull image .. Aug 17 15:19:38 TMX-ROCKY-9.4 domino_container[59225]: Error: short-name resolution enforced but cannot prompt without a TTY Aug 17 15:19:39 TMX-ROCKY-9.4 domino_container[59133]: Image [hclcom/domino:nosuch] does not exist - please build image first Aug 17 15:19:39 TMX-ROCKY-9.4 systemd[1]: domino_container.service: Can't open PID file /run/domino_container.service.pid (yet?) after start: Operation not permitted Aug 17 15:19:39 TMX-ROCKY-9.4 systemd[1]: domino_container.service: Failed with result 'protocol'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: https://wiki.rockylinux.org/rocky/support ░░ ░░ The unit domino_container.service has entered the 'failed' state with result 'protocol'

The old script starts the container from the local config as expected:

dominoctl start

Info: Cannot read configured environment file [env_container]!


.dominoctl/domino_container

b054663315ddacbc5b6759466104b91db83d87e383edc7c2eb1e58c165d61eb9

TMX-ROCKY-9:/local/test

podman ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b054663315dd localhost/hclcom/domino:latest 5 seconds ago Up 6 seconds (healthy) domino TMX-ROCKY-9:/local/test

Daniel-Nashed commented 2 months ago

It works for me as you can see in the logs I provided. Podman involves a systemd script per container. You are providing information step by step because you think you already know what the problem is. This makes troubleshooting more difficult. I am not saying you cannot use podman. But it is way more complicated with multiple instances.

I cannot reproduce your problem but I also don't have your full config. If it only podman is installed, what I though is going wrong isn't the root cause.

The script has only one place where the configuration is determined. If you are getting something different started and don't provide more details just look into the script and add debug output to it if the standard debug does not provide sufficient information. Since you seem to know better then the developer what the problem is, I would suggest you debug on your own. Greetings from the gym

Daniel-Nashed commented 2 months ago

Did you check the different systemd scripts how they are configured? You should have one per container

rowlk commented 2 months ago

You must have missed some of the info in my original post. I started with a brand new server installed the necessary components transferred my data and started my first and only container. It started up based on the default config in etc/sysconfig ignoring the local correct config. That's where the story starts which has no correlation with multiple instances.

I do not think my statement that the the old script works and the new one does not (on Podman) is the same as me saying I know what the exact problem is, that is just a simple provable fact which is surely important in trying to get to the bottom of the issue.

I actually spent several hours yesterday trying to debug the script and resolve and only after failing to make progress did I raise the issue here.

I have also tested the new script on the old server which is running centOS 8 and an older version of Podman and it behaves on exactly the same way so we can eliminate OS and Podman versions to an extent.

I will look into the systemd scripts and see if that yields any more clues and can also spin up a new server with docker which as you indicate by your tests will very likely work just fine.

Daniel-Nashed commented 2 months ago

With podman the server is started with a systemd script which needs a reference to the container. It cannot use the same logic then invoking the script from a directory.

I cannot say why it even worked before with the old script. That podman is not a daemon is complicating the configuration. The systemd service needs to know where the config is.

Daniel-Nashed commented 2 months ago

The systemd script is starting the server domino_container.service.
The script invokes the container script, which has logic to find the configuration directory.

Search for DOMINO_CFG_DIR to see the logic.
This configuration by default can't find a local directory. It could only find the standard config or a home configuration.
But this will not work for multiple partitions.

With multiple partitions and Podman you could set the DOMINO_CFG_DIR explicitly in domino_container.service.

Using Docker is a lot easier, because it does not need a systemd script.
I don't know why the old version works for you in this configuration.
It was never intended to work with a local directory without specific configuration.

The Domino container project and the Domino control script has a lot of options and is quite complex.
So not all combinations of settings will work without special configuration.

I don't know how you are going to use multiple instances. But Docker would be much easier to configure.
You are not giving me all details about your configuration, so it is difficult to help better.

rowlk commented 2 months ago

Many thanks and yes that makes perfect sense and has allowed me to spot the bug in the old script that meant it "works" with local directories.

Within check_container_environment()

Old Script line 299 if [ "$DOMINO_NO_SYSTEMD" != "yes" ]; then DOMINO_SYSTEMD_NAME=

New Script line 438 if [ "$DOMINO_NO_SYSTEMD" = "yes" ]; then DOMINO_SYSTEMD_NAME=

A better, more modern and easy to maintain approach, would be to use Quadlet scripts to define the systemd services for each domino container we want to run. Should be able to build that into your script quite easily I'll have a look if I get a chance later in the week.

Daniel-Nashed commented 2 months ago

Glad you found out what was different. I would have helped more, if you would have provided more information. So I had to read between the lines of what you are doing and come up with suggestions what to try next.

Podman with just a single Domino container should work well in the way it is today. I took a quick look at quadlet-podman and I don't see the big benefit.

dominoctl ships with a ready to go service, which is intended for a standard configuration. You are running a more advanced configuration with multiple containers. The home directory and local directory options are mainly intended for developers who want to run multiple servers. Those work also well with Docker, because Docker does not need separate systemd services.

With Podman multiple containers are much more complicated and I would really recommend using Docker instead. But if you are running Podman you now know how to set it up and how the different parts glue together.

I would be very interested to understand in which type of scenario you are using multiple containers on one server.

We are using a Docker compose stack with 5 servers for automation testing. All the tests are running on one Redhat Enterprise 9 VM with multiple containers almost like a lightweight VM.

But in production I would not see many use cases where you would run multiple Domino servers on the same host.

rowlk commented 2 months ago

Hi Daniel yes agreed Quadlets are not applicable here as the service is simply running the common script.

Everything is now migrated to the new host, I have just used separate service files for each domino instance with DOMINO_CFG_DIR defined so it is now running as intended rather than by accident.

In this case we have several small customers for whom we host their domino services. These are very low resource workloads so having them on one host just makes sense for us. They were originally set up using the old domino hosted orgs which was dropped (v10 or v11 can't quite recall) but the containerised deployment is far better anyway.

We also have instances where customers used to have partitioned servers which we would now just set up with separate containers on one host. Again these are pretty low resource workloads. We have one example internally where we have three containerised servers we use for managing our template development - testing - deployment cycle. They could be on separate hosts but really don't need to be.

Larger heavy load servers are all on dedicated hosts all I believe are still running native although I am very tempted to containerise, the upgrade path sells it alone being so quick and easy.

I suppose in summary multi domino per host is useful where you perhaps want or need to separate the services being hosted but overall are not resource hungry.

Daniel-Nashed commented 2 months ago

Thanks, this clarifies your needs. Makes sense this way and is just a bit more complicated with Podman. Thanks also for confirming the DOMINO_CFG_DIR approach works for you.

My production servers are a mix of Podman and Docker but just one instance. But I run a lot of really wild configurations for testing. Some of them I don't even dare to mention anywhere...

I also test with many different container base images.

What type of VMs are you using? I hope not VMware..

Here is something I looked into for optimized infrastructure. But depending where you host, you just get a KVM based VM which you can use with multiple containers. Containers are a perfect fit for smaller Domino servers!

Great this works for you and you find it useful and more glad that we solved your issue.