piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
389 stars 62 forks source link

piraeus-operator v1.10.6 piraeus-cs-controller crashLoop error #513

Open Icedroid opened 1 year ago

Icedroid commented 1 year ago

I use helm install piraeus-operator v1.10.6, piraeus-cs-controller crashLoop with error as follow:

Operating system:   Linux, Version 4.19.90-24.4.v2101.ky10.x86_64
Environment:        amd64, 125 processors, 30688 MiB memory reserved for allocations

System components initialization in progress

Loading configuration file "/etc/linstor/linstor.toml"2023-08-02T14:13:02.081967097+08:00 
06:13:02.869 [main] INFO  LINSTOR/Controller - SYSTEM - ErrorReporter DB first time init.
06:13:02.872 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
06:13:02.919 [main] INFO  LINSTOR/Controller - SYSTEM - Database type is Kubernetes-CRD
06:13:02.919 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.
06:13:03.446 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 526ms
06:13:03.446 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.
06:13:03.464 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
06:13:03.464 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
06:13:03.465 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
06:13:03.476 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
06:13:03.476 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
06:13:03.477 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful
06:13:04.769 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 1323ms
06:13:04.770 [Main] INFO  LINSTOR/Controller - SYSTEM - Cryptography provider: Using default cryptography module
06:13:05.101 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
06:13:05.589 [Main] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking using K8sCrd driver
06:13:05.593 [Main] INFO  LINSTOR/Controller - SYSTEM - SpaceTrackingService: Instance added as a system service
06:13:05.594 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
06:13:05.595 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the k8s crd database connector
06:13:05.596 [Main] INFO  LINSTOR/Controller - SYSTEM - Kubernetes-CRD connection URL is "k8s"
06:13:07.283 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
06:13:07.293 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading security objects
06:13:07.490 [Main] INFO  LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
06:13:07.621 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.sun.xml.bind.v2.runtime.reflect.opt.Injector (file:/usr/share/linstor-server/lib/jaxb-impl-2.2.11.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of com.sun.xml.bind.v2.runtime.reflect.opt.Injector
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Aug 02, 2023 6:13:09 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [[::]:3370]
Aug 02, 2023 6:13:09 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
[8.513s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[8.514s][warning][os,thread] Failed to start the native thread for java.lang.Thread "grizzly-http-server-239"
06:13:10.138 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
06:13:10.156 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
06:13:10.191 [Main] ERROR LINSTOR/Controller - SYSTEM - unable to create native thread: possibly out of memory or process/resource limits reached [Report number 64C9F3EE-00000-000000]

[8.570s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[8.570s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Logging-Cleaner"
time="2023-08-02T06:13:10Z" level=fatal msg="failed to run" err="exit status 199"
WanzenBug commented 1 year ago

Your system seems to be overloaded, or you have set a restrictive resource limit:

[8.513s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Icedroid commented 1 year ago

overloaded, the controller will use how much memory? Operating system: Linux, Version 4.19.90-24.4.v2101.ky10.x86_64 Environment: amd64, 125 processors, 30688 MiB memory reserved for allocations

how to know k8s node h whether have set a restrictive resource limit?

WanzenBug commented 1 year ago

The machine/hardware itself seems fine. Did you set any

resources:
  limits:
    ...

on the Pod, or were they set automatically? Could you check the Pod YAML?

What seems more likely: your system has a too low /proc/sys/kernel/threads-max or /proc/sys/kernel/pid_max.

Icedroid commented 1 year ago

The machine/hardware itself seems fine. Did you set any

resources:
  limits:
    ...

on the Pod, or were they set automatically? Could you check the Pod YAML?

What seems more likely: your system has a too low /proc/sys/kernel/threads-max or /proc/sys/kernel/pid_max.

I am sure piraeus-cs-controller pod has no any resources.limits config. the k8s node running pod : $ cat /proc/sys/kernel/threads-max 4113123 $ cat /proc/sys/kernel/pid_max 4194304

WanzenBug commented 1 year ago

You can check /proc/stat to get the number of processes running. But I guess this would not be the issue here. Perhaps a too low ulimit is set. Can you run ulimit -S -a and ulimit -H -a when the container starts up?