opensearch-project / helm-charts

:wheel_of_dharma: A community repository for Helm Charts of OpenSearch Project.
https://opensearch.org/docs/latest/opensearch/install/helm/
Apache License 2.0
171 stars 234 forks source link

[BUG][OpenSearch] OpenSearch fails to start when readOnlyRootFilesystem set to 'true' #369

Open gsmith-sas opened 1 year ago

gsmith-sas commented 1 year ago

Describe the bug When I set the readOnlyRootFilesystem key to 'true', the OpenSearch pods cannot be started. The following messages appear in the pod log:

Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin Enabling OpenSearch Security Plugin Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin ./opensearch-docker-entrypoint.sh: line 61: /usr/share/opensearch/logs/performance-analyzer.log: Read-only file system Exception in thread "main" java.nio.file.FileSystemException: /tmp/opensearch-227142181805039740: Read-only file system at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397) at java.base/java.nio.file.Files.createDirectory(Files.java:700) at java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:134) at java.base/java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:171) at java.base/java.nio.file.Files.createTempDirectory(Files.java:1017) at org.opensearch.tools.launchers.Launchers.createTempDirectory(Launchers.java:79) at org.opensearch.tools.launchers.TempDirectory.main(TempDirectory.java:67)

To Reproduce Steps to reproduce the behavior:

  1. I added the following stanza to my user-values.yaml file:

    securityContext:
      readOnlyRootFilesystem: true
  2. Deploy OpenSearch using Helm (and pointing to the user-values.yaml file)

  3. I can see the OpenSearch pod starting up but fails fairly early with the error messages shown above.

  4. If I remove that stanza, OpenSearch Dashboards comes up without problems.

Expected behavior I expected OpenSearch to come up without problems.

Chart Name I've seen this error with using both OpenSearch version 1.3.5 (Helm chart version 1.14.1) and 2.4.1 (Helm chart version 2.9.0).

Host/Environment (please complete the following information):

Additional context I notice that readOnlyRootFilesystem key is included (on line 294) in the Helm charts values.yml file but commented out.

prudhvigodithi commented 1 year ago

[Triage] Hey @gsmith-sas there is an option to disable the PA https://github.com/opensearch-project/opensearch-build/tree/main/docker/release#disable-performance-analyzer-agent-cli-and-related-configurations by passing an env value DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI in extraEnvs block of the chart, can you try this was to disable Performance Analyzer. Thank you

gsmith-sas commented 1 year ago

@prudhvigodithi Thank you for the quick response. I was not aware of that environment variable, so thank you for telling me about it.

Unfortunately, while that seems to have disabled the Performance Analyzer (notice the message indicating it has been disabled), the OpenSearch pods are still failing to come up.

Here are the messages I am seeing in my OpenSearch pod logs:

Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
Exception in thread "main" java.nio.file.FileSystemException: /tmp/opensearch-10645933807136353700: Read-only file system
    at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
    at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397)
    at java.base/java.nio.file.Files.createDirectory(Files.java:700)
    at java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:134)
    at java.base/java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:171)
    at java.base/java.nio.file.Files.createTempDirectory(Files.java:1017)
    at org.opensearch.tools.launchers.Launchers.createTempDirectory(Launchers.java:79)
    at org.opensearch.tools.launchers.TempDirectory.main(TempDirectory.java:67)
gsmith-sas commented 1 year ago

@prudhvigodithi Do you have any other ideas what might be going on here? As the log messages show in my last update, I've disabled the Performance Analyzer and yet something is still throwing the exception due to the Read-only file system setting.

gsmith-sas commented 1 year ago

@prudhvigodithi I've just tested with OpenSearch 2.6.0 and the problem persists. The error messages are the same. Can you provide some guidance on how to overcome this and/or when it might be fixed? Thanks!

gsmith-sas commented 1 year ago

I attempted to get around this problem by mounting a couple of emptyDirs (since that had been successful with the similar problem in OpenSearch Dashboards (#368). Unfortunately, while that help eliminate some of the error messages, OpenSearch still wouldn't start.

Just in case it will help the team find the problem, here's what I added:

extraEnvs:
 - name: OPENSEARCH_TMPDIR
   value: "/tmp/g_opensearch_tmpdir"

extraVolumes:
- name: gtempdir
  emptyDir: { }
- name: glogdir
  emptyDir: { }

extraVolumeMounts: 
- name: gtempdir
  mountPath: "/tmp/g_opensearch_tmpdir"
- name: glogdir
  mountPath: "/usr/share/opensearch/logs"

I started with just setting the OPENSEARCH_TMPDIR environment variable and defining the gtempdir volume since the initial error message seemed to indicate the problem was related to creating a temp directory. After doing that, I started seeing new messages indicating that the files couldn't be written to the logs directory and the JVM couldn't be started. Here are examples of those error messages:

Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
bin/opensearch-cli: line 7: cannot create temp file for here-document: Read-only file system
Exception in thread "main" java.lang.RuntimeException: starting java failed with [1]
output:
[0.000s][error][logging] Error opening log file 'logs/gc.log': Read-only file system
[0.000s][error][logging] Initialization of output 'file=logs/gc.log' using options 'filecount=32,filesize=64m' failed.
error:
Invalid -Xlog option '-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m', see error log for details.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
    at org.opensearch.tools.launchers.JvmErgonomics.flagsFinal(JvmErgonomics.java:125)
    at org.opensearch.tools.launchers.JvmErgonomics.finalJvmOptions(JvmErgonomics.java:87)
    at org.opensearch.tools.launchers.JvmErgonomics.choose(JvmErgonomics.java:70)
    at org.opensearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:150)
    at org.opensearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:108)

After that, I added the 2nd extra volume (glogdir) and corresponding mount. That eliminated the Java error messages and, I believe, allowed the JVM to start up. But, alas, there still was no joy. The final set of messages looked like this:

Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
bin/opensearch-cli: line 7: cannot create temp file for here-document: Read-only file system
/usr/share/opensearch/bin/opensearch: line 62: cannot create temp file for here-document: Read-only file system

At this point, I had to get back to my other responsibilities. I'm hoping some/all of this debugging will help the team develop a solution that allows OpenSearch to be run with readOnlyRootFilesystem set to 'true'.

rdvansloten commented 1 year ago

Any update on this? Seeing the same problem.

prudhvigodithi commented 1 year ago

Hey @gsmith-sas, readOnlyRootFilesystem this is coming from the k8s side securityContext, but the OpenSearch application needs to write data to file system in terms of logs as well as the actual data, i'm not sure if you can use a Read-only file system for applications like OpenSearch that constantly writes data to the file system.

However you can use path.logs and path.data values in the config.yml to override the paths and mount the disks in accordingly? The readOnlyRootFilesystem AFAIK is not applicable to mounted volumes.

@gsmith-sas @rdvansloten can you please share your use case to run OpenSearch on a Read-only file system?

Thank you @bbarani @dblock

gsmith-sas commented 1 year ago

@prudhvigodithi Thank you for responding. I'm happy to provide information about my use-case, but it isn't really a use-case issue; this is a Kubernetes security issue. Kubernetes security best practices recommend that pods/containers be configured without access to the underlying Kubernetes node's filesystem. This is generally implemented by setting the readOnlyRootFilesystem property within the pod's securityContext to 'true'. For example, this is recommendation 3.9 (on page 8 of the PDF) in the OWASP Container Security Verification Standard. In fact, security scanning tools, including Microsoft Defender for Cloud, will "flag" instances of containers that are NOT configured with this property set. While some organizations may allow security exceptions to be granted that would permit running in spite of the "flagged" violation, some organizations simply won't permit to use of such software. I suspect the current OpenSearch behavior reflects its origins as a non-containerized application.

jbnjohnathan commented 1 year ago

I have the same issues and have done the same steps as @gsmith-sas to try to solve it. I did get a bit further but hit other error messes.

@gsmith-sas When I got the error cannot create temp file for here-document: Read-only file system I did the following: Set these 2 ENVs DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI = “true” DISABLE_INSTALL_DEMO_CONFIG = “true”

Then created this volume with emptyDir

        volumeMounts:
        - name: tmpfs
          subPath: tmp
          mountPath: /tmp

But that only gets you to the next error message Exception in thread "main" org.opensearch.bootstrap.BootstrapException: java.nio.file.FileSystemException: /usr/share/opensearch/config/opensearch.keystore.tmp: Read-only file system

I have't managed to get around this issue as it seems OpenShift wants to create a new keystore file when starting, but the config folder is of course not writable. I checked the config options to see if I could point this file to another folder that I would mount with emptyDir, but the config is only for changing the filename, it will always point to the config folder anyway.

I created this issue for this: https://github.com/opensearch-project/opensearch-build/issues/3991

sandy2008 commented 1 year ago

Here is our workaround in the values.yaml to make helm chart work.

extraInitContainers:
  - name: copy-conf-data
    image: busybox
    command:
    - sh
    - -c
    - cp -r /usr/share/opensearch/config/* /config/ 
    - chmod -R 777 /config/
    - ls /config/
    volumeMounts:
      - name: configdir
        mountPath: /config/
    securityContext:
      readOnlyRootFilesystem: true
gsmith-sas commented 1 year ago

@sandy2008 Can you provide more information about your work-around? I've tried adding that block to my values.yaml file but the OpenSearch wouldn't come up.

I added the following block to my values.yaml file as well (so the configdir would be available to the main container):

extraVolumeMounts:
- name: configdir
  mountPath: /config/

And now the pods try to start up...but fail with the message "*cp: can't stat '/usr/share/opensearch/config/': No such file or directory**" in the container log for the new initContainer.

Thanks!

nehapareshdalal commented 11 months ago

any luck finding the workaround? I am stuck with the same issue.

rdvansloten commented 11 months ago

Hey @gsmith-sas, readOnlyRootFilesystem this is coming from the k8s side securityContext, but the OpenSearch application needs to write data to file system in terms of logs as well as the actual data, i'm not sure if you can use a Read-only file system for applications like OpenSearch that constantly writes data to the file system.

However you can use path.logs and path.data values in the config.yml to override the paths and mount the disks in accordingly? The readOnlyRootFilesystem AFAIK is not applicable to mounted volumes.

@gsmith-sas @rdvansloten can you please share your use case to run OpenSearch on a Read-only file system?

Thank you @bbarani @dblock

Sorry for the late reply, but ReadOnly FS is a security requirement in the org I was working for. This makes sense, because it's best-practice that the "OS" filesystem inside a container is not modified. This is to prevent attackers from installing or downloading tools to piggyback from a compromised container into the rest of the network, or running malicious workloads inside existing containers (miners, listeners, etc)

See: https://labs.withsecure.com/publications/executing-arbitrary-code-executables-in-read-only-filesystems

nehapareshdalal commented 11 months ago

I know the use case you're asking is to someone else but I am sort of stuck on the issue. I need readonlyrootfilesystem as the policy set on my k8s AKS cluster. There were couple of issue I encountered like logs , tmp path etc it wasnt able to write but setting volumes to emptyDir: {} solved the issue. Now the next set of issue has to do with config folder /usr/share/opensearch/config. If I map that as well with emptyDir it removes out all the existing files like jvm.options etc but can add all pem, keystore files that gets generated runtime with each docker run. I was wondering if there's a solution for this is to either disable cert generation (I assume thats from security standard and to handle that we using service mesh at cluster level).

rdvansloten commented 11 months ago

I know the use case you're asking is to someone else but I am sort of stuck on the issue. I need readonlyrootfilesystem as the policy set on my k8s AKS cluster. There were couple of issue I encountered like logs , tmp path etc it wasnt able to write but setting volumes to emptyDir: {} solved the issue. Now the next set of issue has to do with config folder /usr/share/opensearch/config. If I map that as well with emptyDir it removes out all the existing files like jvm.options etc but can add all pem, keystore files that gets generated runtime with each docker run. I was wondering if there's a solution for this is to either disable cert generation (I assume thats from security standard and to handle that we using service mesh at cluster level).

There's much more going on in that folder, this was also a show stopper for us. You can't remap that entire folder, sadly. And the certificates are used for node to node authentication, I don't think it cares about what you're doing with mTLS in your mesh.

nehapareshdalal commented 11 months ago

I was thinking to do something in Dockerfile where all generated files and folders from /usr/share/opensearch/config/ will be shifted in /tmp directory and something in posthook of k8s yaml I will replace them back in /usr/share/opensearch/config/ folder. anything you can suggest around this?

scheffold commented 11 months ago

I struggeled with the same but as a workaround I copied all files from config folder and created as separated docker image

FROM  busybox
ADD config /config/
RUN  mkdir -p /mnt

Then you can add a init container like

initContainers: #
  - name: copy-conf-data
    image: my-config-image:v1
    imagePullPolicy: Never
    command: [ "sh",  "-c", "cp -rvT /config/ /mnt/" ]      
    volumeMounts:
      - name: config
        mountPath: /mnt/    

You need then also some extra Volume mount

extraVolumes: 
   - name: temp
     emptyDir: {}
   - name: log
     emptyDir: {}
   - name: config      
      emptyDir: {}  

and connect then with the opensearch image

  - name: temp
     mountPath: /tmp
  - name: log
     mountPath: /usr/share/opensearch/logs
  - name: config
     mountPath: /usr/share/opensearch/config/

Not ideal - but it works with persistence.enabled flag

nehapareshdalal commented 11 months ago

Thank you, can someone please point me to the latest github repo that has opensearch Dockerfile? I could see 2 repo https://github.com/opensearch-project/OpenSearch/tree/main/distribution/docker/src/docker (not sure how to build it as it has lot of args/vars to be filled-in and other one is https://github.com/opensearch-project/docker-images (this shows error of java.lang.IllegalArgumentException: Could not load codec 'Lucene95'. Did you forget to add lucene-backward-codecs.jar? ) please help.

sandy2008 commented 11 months ago

@sandy2008 Can you provide more information about your work-around? I've tried adding that block to my values.yaml file but the OpenSearch wouldn't come up.

I added the following block to my values.yaml file as well (so the configdir would be available to the main container):

extraVolumeMounts:
- name: configdir
  mountPath: /config/

And now the pods try to start up...but fail with the message "*cp: can't stat '/usr/share/opensearch/config/': No such file or directory**" in the container log for the new initContainer.

Thanks!

Hmmm, we actually got another workaround, which is to do volume mounts directly for the config files, that was also working for us as well.

bbarani commented 11 months ago

Thank you, can someone please point me to the latest github repo that has opensearch Dockerfile? I could see 2 repo https://github.com/opensearch-project/OpenSearch/tree/main/distribution/docker/src/docker (not sure how to build it as it has lot of args/vars to be filled-in and other one is https://github.com/opensearch-project/docker-images (this shows error of java.lang.IllegalArgumentException: Could not load codec 'Lucene95'. Did you forget to add lucene-backward-codecs.jar? ) please help.

Here's the link to Docker file used for generating docker images for OpenSearch distribution.

prudhvigodithi commented 11 months ago

One more way to address this is using emptyDir with medium: Memory

    emptyDir:
     medium: Memory

When emptyDir is memory-backed, the volume is backed by a tmpfs filesystem, which means they will be stored in memory and not on the backing storage of the node. https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

sshambar commented 10 months ago

I managed to get the container to start with --read-only by adding the following volume mounts: (prob just the keystore files are required, but the others make config updates easier):

-v ./config/opensearch.yml:/usr/share/opensearch/config/opensearch.yml:Z -v ./config/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:Z -v ./config/opensearch.keystore.tmp:/usr/share/opensearch/config/opensearch.keystore.tmp:Z -v ./config/opensearch-security:/usr/share/opensearch/config/opensearch-security:Z -v ./data:/usr/share/opensearch/data:Z -v ./logs:/usr/share/opensearch/logs:Z

This way, updated containers will retain the original config files except for ones you're likely to edit. (edited to add data/logs which might not be clear as required for read-only)