openemr / openemr-devops

OpenEMR administration and deployment tooling
GNU General Public License v3.0
91 stars 140 forks source link

Cold bringup in kubernetes of 7.0.0 fails #336

Closed aebrahim closed 1 year ago

aebrahim commented 1 year ago

Describe the bug

write concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Run minikube start
  2. Run minikube dashboard
  3. In kubernetes/kub-up and kubenernetes/kub-down replace kubectl with minikube kubectl --
  4. Change containers in kubernetes/openemr/deployment.yaml to openemr/openemr:7.0.0
  5. Run kub-up
  6. View the logs for openemr in the Pods section of the minikube dashboard to see this error.
    touch: /var/www/localhost/htdocs/openemr/sites/default/docker-initiated: No such file or directory

However, if we first do kub-up with version 6.0.0, and then change it to 7.0.0, and do kub-up again, everything works fine.

Expected behavior

A cold bringup of openemr 7.0.0 should succeed.

Client configuration

jesdynf commented 1 year ago

Hn. Okay, I'm gonna be straight with you -- I have very little practical experience with Kubernetes and I wrote that more hoping that somebody would come and restructure it than in any particular confidence I was doing the right things. So it's likely, in other words, that you're more familiar with Kubernetes than I.

I'll try to check this on the weekend, but here's a couple lines of inquiry before then:

I hope this helps! If you could send me the full log from your dashboard, that might be a help -- you might also consider making sure you've absolutely and fully purged any volumes you have between test runs.

aebrahim commented 1 year ago

Thanks for the response @jesdynf - I'm also learning kubernetes.

The error also appears with kubectl applied into a cluster running in GKE, but minikube let me give super easy to reproduce instructions.

This bug also affects 6.1.0

jesdynf commented 1 year ago

Hi! I've made some progress on this. It's the volume mounting described in https://stackoverflow.com/questions/59901574/kubernetes-share-non-empty-volume . What's happening is that under Kubernetes, the volumes are being created as empty, and then never getting their contents loaded in. (We never noticed because this is not Docker's behavior.) This is why starting with a 6.0.0 instance and then upgrading the image works fine -- the volume is (somehow???) getting initialized in 6.0.0 but future images aren't doing that.

I currently have no idea how 6.0.0 works at all, but I now have replication and an explanation for all observed behavior, so that's something. Still working.

bradymiller commented 1 year ago

I remember this being an issue (shared volumes starting empty) when first doing this and think why ended up putting in this block of code: https://github.com/openemr/openemr-devops/blob/master/docker/openemr/7.0.1/openemr.sh#L72-L76

bradymiller commented 1 year ago

I bet it was the change in ordering of the code. By doing the following block before the block above means that the call to /var/www/localhost/htdocs/openemr/sites/default may not exist: https://github.com/openemr/openemr-devops/blob/master/docker/openemr/7.0.1/openemr.sh#L60

Note that in 6.0.0, this line of code is after the block that makes sure /var/www/localhost/htdocs/openemr/sites/ is populated: https://github.com/openemr/openemr-devops/blob/master/docker/openemr/6.0.0/autoconfig.sh#L56

jesdynf commented 1 year ago

That's exactly it, then. Also explains why both nodes were trying to become docker leaders, the promotion mechanic wasn't using the shared volume. Okay! Now I know what needs fixing. Can you remind me how you push images to the hub? I may need to push some images to my personal hub for testing before I'm confident about patching main.

bradymiller commented 1 year ago

Guessing effect all the modern dockers (the ones that are actively used and supported/updated), which are 7.0.0 7.0.1 flex-3.15-8 flex-3.17 (note I edited this to 3.17; we do not need to support 3.16) flex-edge (basically will be the same code changes in all of them, so would just test 7.0.0 and then all should be good)

I can update all the official dockers (along with the arm builds) via clicks on Github Action (use code in master branch in this repo), so that will be easy. The problem will be 7.0.0 since that uses rel-700 and pretty sure we have stuff in there that is not in most recent patch (ie. I will temporarily change that docker build script to use 7.0.0.2 tag). just let me know when it's ready to go and I can do that.

For testing for your personal use can push to docker via: docker push jesdynf/openemr (this will default it to latest tag, could also give it a specific tag if want) docker push jesdynf/openemr:tagname

And then can see it here (check out tags tab) https://hub.docker.com/r/jesdynf/openemr

bradymiller commented 1 year ago

I should of mentioned that 7.0.1 and flex-edge auto build nightly via github actions (consider these dev type dockers, so nice to see when they break); just a cool thing that happens that has no effect on what we are doing here :)

jesdynf commented 1 year ago

Okay! Image jesdynf/openemr:7.0.0 has been booted in Kubernetes as both a single instance and as a two-node cluster, and I have to tell you this delights me to see.

Restoring empty /etc/ssl directory.
Restoring empty /var/www/localhost/htdocs/openemr/sites directory.
Generating a RSA private key
..................................................................................................................................................++++
....................++++
writing new private key to '/etc/ssl/private/selfsigned.key.pem'
-----
Running quick setup!
OpenEMR configured.
Setup Complete!
Setting user 'www' as owner of openemr/ and setting file/dir permissions to 400/500
Default file permissions and ownership set, allowing writing to specific directories
Removing remaining setup scripts
Setup scripts removed, we should be ready to go now!
Love OpenEMR? You can now support the project via the open collective:
 > https://opencollective.com/openemr/donate
Starting apache!

... versus ...

Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Waiting for the docker-leader to finish configuration before proceeding.
Love OpenEMR? You can now support the project via the open collective:
 > https://opencollective.com/openemr/donate
Starting apache!

One node is correctly deferring to the other.

Patches to 7.0.0 (and others) are now in progress, but you're welcome to use my image jesdynf/openemr:7.0.0 to test or otherwise get moving until we're caught up. I will close this once the patches are in.

jesdynf commented 1 year ago

Okay, patch is in. I'm not 100% I got the flex stuff right though, they diverged farther than I figured from mainline. Think it's okay. As soon as @bradymiller says the images are up we should be done here.

bradymiller commented 1 year ago

@jesdynf , been testing this on 7.0.1 and flex-edge after building these with recent changes (easy to test these since both dev environments so can build the public dockers without worrying about breaking anything).

7.0.1 is working really nicely but still not working on flex-edge (just gotta debug it a bit to see why breaking, guessing will be something simple). Are you using minikube or kind or something else? I've started using kind which is pretty neat and I like better than minkube. Also updating readme for both minkube and kind.

as an aside, here's output from kubectl get all command after setting replicas to 14 (via kubectl scale deployment.apps/openemr --replicas=14) :) :

NAME                             READY   STATUS    RESTARTS   AGE
pod/mysql-5c59bcb9c-msqxr        1/1     Running   0          74m
pod/openemr-66fbff9995-2g5xd     1/1     Running   0          2m56s
pod/openemr-66fbff9995-42v6s     1/1     Running   0          2m56s
pod/openemr-66fbff9995-5485t     1/1     Running   0          47m
pod/openemr-66fbff9995-5b2vf     1/1     Running   0          2m56s
pod/openemr-66fbff9995-9rn4v     1/1     Running   0          2m20s
pod/openemr-66fbff9995-c66w5     1/1     Running   0          2m56s
pod/openemr-66fbff9995-cgp9z     1/1     Running   0          2m56s
pod/openemr-66fbff9995-lhxbg     1/1     Running   0          2m56s
pod/openemr-66fbff9995-n8jn8     1/1     Running   0          74m
pod/openemr-66fbff9995-r7r2l     1/1     Running   0          2m20s
pod/openemr-66fbff9995-s8bw8     1/1     Running   0          2m20s
pod/openemr-66fbff9995-tv5r6     1/1     Running   0          2m56s
pod/openemr-66fbff9995-wg5rd     1/1     Running   0          2m20s
pod/openemr-66fbff9995-xw6p7     1/1     Running   0          2m56s
pod/phpmyadmin-f4d9bfc69-vhtbm   1/1     Running   0          74m
pod/redis-7f945f9f4c-jpqtf       1/1     Running   0          74m

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
service/kubernetes   ClusterIP      10.96.0.1       <none>        443/TCP                         76m
service/mysql        ClusterIP      10.96.78.14     <none>        3306/TCP                        74m
service/openemr      LoadBalancer   10.96.125.245   <pending>     8080:30059/TCP,8090:32005/TCP   74m
service/phpmyadmin   NodePort       10.96.8.253     <none>        8081:32562/TCP                  74m
service/redis        ClusterIP      10.96.175.187   <none>        6379/TCP                        74m

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql        1/1     1            1           74m
deployment.apps/openemr      14/14   14           14          74m
deployment.apps/phpmyadmin   1/1     1            1           74m
deployment.apps/redis        1/1     1            1           74m

NAME                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-5c59bcb9c        1         1         1       74m
replicaset.apps/openemr-66fbff9995     14        14        14      74m
replicaset.apps/phpmyadmin-f4d9bfc69   1         1         1       74m
replicaset.apps/redis-7f945f9f4c       1         1         1       74m
jesdynf commented 1 year ago

I used minikube to test the Kubernetes stuff -- although I had to change the volume type in the volumes, so I didn't keep the minikube changes when I was done -- but I didn't test flex-edge at all, sorry.

bradymiller commented 1 year ago

Practically having flex is a bit crazy anyways since each replica needs to do it's own build (especially when grabbing code from a dynamic branch such as master... ) :) But happy to get flex to work though and watching 7.0.1 go so smoothly on testing gives me some motivation to spend time on flex :) (there is just something really fun about making tons of openemr replicas and then removing them; hopefully may even have time to figure out how to make more than one node/cluster on Kind which would be really neat to play around with)

bradymiller commented 1 year ago

dockers are all working and updated including 7.0.0 :) thanks @jesdynf !

Also added some fun kubernetes orchestration stuff. In the README showed how to get a cluster going with kind (I liked this since easy to get going and does not require any manipulation of current scripts), either a single node or a four node cluster. And with the four node set up a shared volume mechanism for them. Forgot how fun this orchestration stuff is. Ended up setting up a four node cluster for 7.0.0 with 45 replicates with just couple commands :) Check out the output for the kubectl get all and kubectl get pod -o wide commands below (2nd command shows all the replicates in different nodes)

[17:48][~/git/openemr-devops/kubernetes(master)]$ kubectl get all
NAME                             READY   STATUS    RESTARTS   AGE
pod/mysql-567786b866-r6fdd       1/1     Running   0          16m
pod/openemr-c4bfdb644-29gnv      1/1     Running   0          16m
pod/openemr-c4bfdb644-2gfqk      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-2sgg9      1/1     Running   0          16m
pod/openemr-c4bfdb644-4qr79      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-4vrtm      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-5gjkx      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-64c5t      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-6g9tv      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-6rdx2      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-7b4hb      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-8l5q9      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-9wxxp      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-btz6v      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-c4klr      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-cqc2c      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-cqv98      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-d6c2l      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-f7nnh      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-fxsg8      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-g5dc5      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-gvdjh      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-h222t      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-jkx24      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-jth9c      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-k4s64      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-kzxd7      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-mcq6r      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-pjqs8      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-qm6sj      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-qmpst      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-qnzj6      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-tg8tv      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-v9nwl      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-wg6dz      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-wwrps      1/1     Running   0          9m17s
pod/openemr-c4bfdb644-x8fn6      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-xdmdv      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-xf9hx      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-xjzls      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-xnpg6      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-z2s2r      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-z2tgl      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-zckx8      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-zcrgn      1/1     Running   0          8m31s
pod/openemr-c4bfdb644-zhp2v      1/1     Running   0          8m31s
pod/phpmyadmin-f4d9bfc69-x6kjq   1/1     Running   0          16m
pod/redis-7f945f9f4c-wj229       1/1     Running   0          16m

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
service/kubernetes   ClusterIP      10.96.0.1       <none>        443/TCP                         18m
service/mysql        ClusterIP      10.96.133.223   <none>        3306/TCP                        16m
service/openemr      LoadBalancer   10.96.73.183    <pending>     8080:30948/TCP,8090:30187/TCP   16m
service/phpmyadmin   NodePort       10.96.86.96     <none>        8081:30393/TCP                  16m
service/redis        ClusterIP      10.96.106.99    <none>        6379/TCP                        16m

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql        1/1     1            1           16m
deployment.apps/openemr      45/45   45           45          16m
deployment.apps/phpmyadmin   1/1     1            1           16m
deployment.apps/redis        1/1     1            1           16m

NAME                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-567786b866       1         1         1       16m
replicaset.apps/openemr-c4bfdb644      45        45        45      16m
replicaset.apps/phpmyadmin-f4d9bfc69   1         1         1       16m
replicaset.apps/redis-7f945f9f4c       1         1         1       16m

[17:54][~/git/openemr-devops/kubernetes(master)]$ kubectl get pod -o wide
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE           NOMINATED NODE   READINESS GATES
mysql-567786b866-r6fdd       1/1     Running   0          17m     10.244.2.2    kind-worker3   <none>           <none>
openemr-c4bfdb644-29gnv      1/1     Running   0          17m     10.244.1.3    kind-worker2   <none>           <none>
openemr-c4bfdb644-2gfqk      1/1     Running   0          9m39s   10.244.2.10   kind-worker3   <none>           <none>
openemr-c4bfdb644-2sgg9      1/1     Running   0          17m     10.244.3.5    kind-worker    <none>           <none>
openemr-c4bfdb644-4qr79      1/1     Running   0          9m39s   10.244.3.9    kind-worker    <none>           <none>
openemr-c4bfdb644-4vrtm      1/1     Running   0          10m     10.244.1.5    kind-worker2   <none>           <none>
openemr-c4bfdb644-5gjkx      1/1     Running   0          9m39s   10.244.3.17   kind-worker    <none>           <none>
openemr-c4bfdb644-64c5t      1/1     Running   0          10m     10.244.1.4    kind-worker2   <none>           <none>
openemr-c4bfdb644-6g9tv      1/1     Running   0          9m39s   10.244.2.8    kind-worker3   <none>           <none>
openemr-c4bfdb644-6rdx2      1/1     Running   0          9m39s   10.244.1.10   kind-worker2   <none>           <none>
openemr-c4bfdb644-7b4hb      1/1     Running   0          9m39s   10.244.3.16   kind-worker    <none>           <none>
openemr-c4bfdb644-8l5q9      1/1     Running   0          10m     10.244.3.7    kind-worker    <none>           <none>
openemr-c4bfdb644-9wxxp      1/1     Running   0          9m39s   10.244.2.16   kind-worker3   <none>           <none>
openemr-c4bfdb644-btz6v      1/1     Running   0          9m39s   10.244.3.10   kind-worker    <none>           <none>
openemr-c4bfdb644-c4klr      1/1     Running   0          10m     10.244.3.6    kind-worker    <none>           <none>
openemr-c4bfdb644-cqc2c      1/1     Running   0          9m39s   10.244.1.11   kind-worker2   <none>           <none>
openemr-c4bfdb644-cqv98      1/1     Running   0          9m39s   10.244.3.12   kind-worker    <none>           <none>
openemr-c4bfdb644-d6c2l      1/1     Running   0          9m39s   10.244.1.16   kind-worker2   <none>           <none>
openemr-c4bfdb644-f7nnh      1/1     Running   0          9m39s   10.244.2.18   kind-worker3   <none>           <none>
openemr-c4bfdb644-fxsg8      1/1     Running   0          9m39s   10.244.3.19   kind-worker    <none>           <none>
openemr-c4bfdb644-g5dc5      1/1     Running   0          9m39s   10.244.3.8    kind-worker    <none>           <none>
openemr-c4bfdb644-gvdjh      1/1     Running   0          9m39s   10.244.2.14   kind-worker3   <none>           <none>
openemr-c4bfdb644-h222t      1/1     Running   0          9m39s   10.244.1.15   kind-worker2   <none>           <none>
openemr-c4bfdb644-jkx24      1/1     Running   0          9m39s   10.244.3.13   kind-worker    <none>           <none>
openemr-c4bfdb644-jth9c      1/1     Running   0          10m     10.244.2.4    kind-worker3   <none>           <none>
openemr-c4bfdb644-k4s64      1/1     Running   0          9m39s   10.244.3.15   kind-worker    <none>           <none>
openemr-c4bfdb644-kzxd7      1/1     Running   0          9m39s   10.244.2.15   kind-worker3   <none>           <none>
openemr-c4bfdb644-mcq6r      1/1     Running   0          9m39s   10.244.3.14   kind-worker    <none>           <none>
openemr-c4bfdb644-pjqs8      1/1     Running   0          10m     10.244.2.6    kind-worker3   <none>           <none>
openemr-c4bfdb644-qm6sj      1/1     Running   0          9m39s   10.244.1.7    kind-worker2   <none>           <none>
openemr-c4bfdb644-qmpst      1/1     Running   0          9m39s   10.244.2.11   kind-worker3   <none>           <none>
openemr-c4bfdb644-qnzj6      1/1     Running   0          9m39s   10.244.1.9    kind-worker2   <none>           <none>
openemr-c4bfdb644-tg8tv      1/1     Running   0          9m39s   10.244.3.11   kind-worker    <none>           <none>
openemr-c4bfdb644-v9nwl      1/1     Running   0          9m39s   10.244.2.13   kind-worker3   <none>           <none>
openemr-c4bfdb644-wg6dz      1/1     Running   0          10m     10.244.1.6    kind-worker2   <none>           <none>
openemr-c4bfdb644-wwrps      1/1     Running   0          10m     10.244.2.5    kind-worker3   <none>           <none>
openemr-c4bfdb644-x8fn6      1/1     Running   0          9m39s   10.244.2.7    kind-worker3   <none>           <none>
openemr-c4bfdb644-xdmdv      1/1     Running   0          9m39s   10.244.1.17   kind-worker2   <none>           <none>
openemr-c4bfdb644-xf9hx      1/1     Running   0          9m39s   10.244.2.17   kind-worker3   <none>           <none>
openemr-c4bfdb644-xjzls      1/1     Running   0          9m39s   10.244.1.8    kind-worker2   <none>           <none>
openemr-c4bfdb644-xnpg6      1/1     Running   0          9m39s   10.244.3.18   kind-worker    <none>           <none>
openemr-c4bfdb644-z2s2r      1/1     Running   0          9m39s   10.244.2.12   kind-worker3   <none>           <none>
openemr-c4bfdb644-z2tgl      1/1     Running   0          9m39s   10.244.1.13   kind-worker2   <none>           <none>
openemr-c4bfdb644-zckx8      1/1     Running   0          9m39s   10.244.1.12   kind-worker2   <none>           <none>
openemr-c4bfdb644-zcrgn      1/1     Running   0          9m39s   10.244.1.14   kind-worker2   <none>           <none>
openemr-c4bfdb644-zhp2v      1/1     Running   0          9m39s   10.244.2.9    kind-worker3   <none>           <none>
phpmyadmin-f4d9bfc69-x6kjq   1/1     Running   0          17m     10.244.1.2    kind-worker2   <none>           <none>
redis-7f945f9f4c-wj229       1/1     Running   0          17m     10.244.2.3    kind-worker3   <none>           <none>
aebrahim commented 1 year ago

Just verified this - 7.0.1 comes up like a charm!!!!!! @bradymiller and @jesdynf awesomenss confirmed.

bradymiller commented 1 year ago

hi @aebrahim , Great to hear! Just to clarify: 7.0.1 is the development version of OpenEMR at this time (the docker is built nightly from most recent development codebase) 7.0.0 is the production version of OpenEMR That being said, 7.0.0 docker should also work like a charm :)

dewet22 commented 8 months ago

This bit me unexpectedly; I was trying to stand up a dev instance in our dev kubernetes cluster and I mounted the shared volume as per instructions. However I did not set SWARM_MODE since it was always only intended to be a single pod – that way it crashes on startup because it doesn't attempt to populate the volume first. I had to dig into openemr.sh to understand the working here, but hopefully this helps somebody else who might be scratching their head.

It might also make sense to bail out in openemr.sh when the sites/default directory is unexpectedly empty and SWARM_MODE isn't set.

jesdynf commented 6 months ago

Sorry for the delay in responding.

I'm not unwilling but I'm not sure where that test would best go -- toss me a PR (or just give me an example test and where you'd like to see it)?