passbolt / charts-passbolt

Helm charts to run Passbolt on Kubernetes. No strings attached charts to run the open source password manager for teams!
https://passbolt.com
GNU Affero General Public License v3.0
40 stars 27 forks source link

PHP-FPM Crashes when Hugepages are Enabled #47

Closed cdmastercom closed 3 months ago

cdmastercom commented 11 months ago

When running in a microk8s cluster with Hugepages enable on the host VM, the PHP-FPM process inside the 'passbolt-depl-srv' pod continually crashes. This manifests as an unhealth pod:

> kubectl get pods -n passbolt
passbolt-depl-srv-59cb6bcdfb-6ptx8        0/1     Running     0 134m

Logs:

> kubectl logs -n passbolt passbolt-depl-srv-59cb6bcdfb-6ptx8
... (More above)
2023-08-25 02:24:25,508 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-25 02:24:25,508 WARN exited: php-fpm (terminated by SIGBUS (core dumped); not expected)
2023-08-25 02:24:26,512 INFO spawned: 'php-fpm' with pid 560
2023-08-25 02:24:27,643 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-25 02:24:27,644 WARN exited: php-fpm (terminated by SIGBUS (core dumped); not expected)
2023-08-25 02:24:28,648 INFO spawned: 'php-fpm' with pid 561
2023-08-25 02:24:28,795 WARN exited: php-fpm (terminated by SIGBUS (core dumped); not expected)
2023-08-25 02:24:29,801 INFO spawned: 'php-fpm' with pid 562
2023-08-25 02:24:30,943 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-25 02:24:30,946 WARN exited: php-fpm (terminated by SIGBUS (core dumped); not expected)
2023-08-25 02:24:31,949 INFO spawned: 'php-fpm' with pid 563
2023-08-25 02:24:33,090 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-25 02:24:33,090 WARN exited: php-fpm (terminated by SIGBUS (core dumped); not expected)
2023-08-25 02:24:34,093 INFO spawned: 'php-fpm' with pid 564

This behaviour can be worked around by disabling the PHP opcache in the pod.

> cat /etc/php/8.2/fpm/php.ini  | grep opcache
;zend_extension=opcache
[opcache]
opcache.enable=0
;opcache.enable_cli=0
;opcache.memory_consumption=128
;opcache.interned_strings_buffer=8
... (more below)

At which point PHP-FPM immediately works. For your reference, please see: https://github.com/kubernetes/kubernetes/issues/71233

In my case disabling hugepages is troublesome since we require it for Mayastor. There may be a better workaround compared to changing the PHP config, but it worked for me.

dlen commented 11 months ago

Hi @cdmastercom !

Have you tried using this feature before? https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

cdmastercom commented 11 months ago

Hi @dlen

Trying that now. The 'workaround' I mentioned isn't nice. I'll let you know how we go.

cdmastercom commented 11 months ago

@dlen

More Info:

> cat /proc/meminfo | grep Huge
AnonHugePages:      8192 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:      513
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         2097152 kB

We enabled huge pages in the container limits and tried various mount points:

    limits:
      hugepages-1Gi: 0Mi
      hugepages-2Mi: 200Mi
      memory: 100Mi
    requests:
      hugepages-1Gi: 0Mi
      hugepages-2Mi: 200Mi
      memory: 100Mi
    volumeMounts:
      - mountPath: /hugepages-2Mi
        name: hugepage-2mi
      - mountPath: /hugepages-1Gi
        name: hugepage-1gi

But, no luck. PHP is still crashing. There may be some issues with our setup, such as the fact that we're using transparent hugepages:

> cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Also we don't have an 1Gi hugepages.

dlen commented 10 months ago

Hello again!

Have you tried to enable opcache hugepages support? https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.huge_code_pages

cdmastercom commented 10 months ago

Hi,

We've tried with that setting both enabled and disabled. The same crash occurs either way. Is far as I can tell that setting is actually not working.

My speculation: I think the problem here is that PHP will use hugepages when it detects them, even if its explicitly disabled, and so it crashes when it cannot actually use any. If PHP thinks hugepages are available, but in fact they are not (as in our environment hugepages exists but permission to access them is denied), it just crashes. If we disable opcache entirely the problem goes away.

Maybe this is a problem with PHP and not this chart?

dlen commented 3 months ago

I'm closing this as I'm unable to reproduce and you seem to have a workaround disabling opcache