pulp / pulp-operator

Kubernetes Operator for Pulp 3. Under active development.
https://docs.pulpproject.org/pulp_operator/
GNU General Public License v2.0
66 stars 50 forks source link

Redis pod failing to write on mounted volume resulting in pulp-content returning 500 #1354

Open vkukk opened 3 days ago

vkukk commented 3 days ago

Version image: quay.io/pulp/pulp-operator:v1.0.0-beta.5 default pulp images.

Describe the bug After enabling cache, pulp-content fails with 500.

[2024-09-17 08:22:23 +0000] [52] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_protocol.py", line 456, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_app.py", line 537, in _handle
    resp = await handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_middlewares.py", line 114, in impl
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/authentication.py", line 48, in authenticate
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/instrumentation.py", line 230, in middleware
    resp = await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 346, in cached_function
    await self.auth(request, self, bk)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 239, in auth_cached
    await cached.set(guard_key, str(guard), base_key=base_key)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 57, in wrapper
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 265, in set
    ret = await self.redis.hset(base_key, key, value)
  File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 615, in execute_command
    return await conn.retry.call_with_retry(
  File "/usr/local/lib/python3.9/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
  File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 589, in _send_command_parse_response
    return await self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 636, in parse_response
    response = await connection.read_response()
  File "/usr/local/lib/python3.9/site-packages/redis/asyncio/connection.py", line 570, in read_response
    raise response from None
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
::ffff:10.2.3.17 [17/Sep/2024:08:22:23 +0000] "GET /pulp/content/mongo-6/tst/ HTTP/1.1" 500 335 "https://pulp3.hostname.tldpulp/content/mongo-6/" "Mozilla/5.0 (X11; Linux x86_64; rv:130.0) Gecko/20100101 Firefox/130.0"

The cache pod is failing due to unsufficient privileges when writing to volume.

$ kubectl exec pod/pulp-redis-6c86f8467-nwrbz -- /bin/ls -l /|grep data
drwxr-xr-x   3 root root 4096 Sep 16 16:49 data
1:M 17 Sep 2024 10:30:00.009 * Background saving started by pid 189536
189536:C 17 Sep 2024 10:30:00.009 # Failed opening the temp RDB file temp-189536.rdb (in server root dir /data) for saving: Permission denied
1:M 17 Sep 2024 10:30:00.110 # Background saving error
1:M 17 Sep 2024 10:30:06.096 * 1 changes in 3600 seconds. Saving...
1:M 17 Sep 2024 10:30:06.097 * Background saving started by pid 189551
189551:C 17 Sep 2024 10:30:06.098 # Failed opening the temp RDB file temp-189551.rdb (in server root dir /data) for saving: Permission denied
1:M 17 Sep 2024 10:30:06.199 # Background saving error

To enable Redis user 999 with group 999 to save on mounted storage, pod must have securityContext.fsGroup with value 999. When I'm trying to enable this by editing Pulp CR: To Reproduce set Pulp CR:

  cache:
    enabled: true
    redis_storage_class: csi-cinder-high-speed
    securityContext:
      fsGroup: 999

kubectl apply -f pulp.yaml strict decoding error: unknown field "spec.cache.securityContext"

Expected behavior proper securityContext is applied and Redis is able to save RDB file.

Additional context OVH Managed Kubernetes 1.30.2

vkukk commented 3 days ago

Appearantly, fsGroup should be enabled according to redis controller code here https://github.com/pulp/pulp-operator/blob/26ac1d96aa977a426e27b05cb2a8251106561b60/controllers/repo_manager/redis.go#L367

When checking actual Pod config:

$ kubectl -n pulp get pod/pulp-redis-6c86f8467-nwrbz -o json| jq -r '.spec.securityContext'
{
  "runAsGroup": 999,
  "runAsUser": 999
}
$ kubectl -n pulp get pod/pulp-redis-6c86f8467-nwrbz -o json| jq -r '.spec.containers.[0].securityContext'
{
  "allowPrivilegeEscalation": false,
  "capabilities": {
    "drop": [
      "ALL"
    ]
  },
  "runAsNonRoot": true,
  "seccompProfile": {
    "type": "RuntimeDefault"
  }
}

So fsGroup defined here https://github.com/pulp/pulp-operator/blob/26ac1d96aa977a426e27b05cb2a8251106561b60/controllers/repo_manager/redis.go#L337 does not get into actual Kubernetes deployment.

gerrod3 commented 2 days ago

Need to look into why User 999 is not allowed to write in the volume for the Redis image.