Open gsmith-sas opened 1 year ago
[Triage] Hey @gsmith-sas there is an option to disable the PA https://github.com/opensearch-project/opensearch-build/tree/main/docker/release#disable-performance-analyzer-agent-cli-and-related-configurations by passing an env value DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI
in extraEnvs block of the chart, can you try this was to disable Performance Analyzer.
Thank you
@prudhvigodithi Thank you for the quick response. I was not aware of that environment variable, so thank you for telling me about it.
Unfortunately, while that seems to have disabled the Performance Analyzer (notice the message indicating it has been disabled), the OpenSearch pods are still failing to come up.
Here are the messages I am seeing in my OpenSearch pod logs:
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
Exception in thread "main" java.nio.file.FileSystemException: /tmp/opensearch-10645933807136353700: Read-only file system
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397)
at java.base/java.nio.file.Files.createDirectory(Files.java:700)
at java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:134)
at java.base/java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:171)
at java.base/java.nio.file.Files.createTempDirectory(Files.java:1017)
at org.opensearch.tools.launchers.Launchers.createTempDirectory(Launchers.java:79)
at org.opensearch.tools.launchers.TempDirectory.main(TempDirectory.java:67)
@prudhvigodithi Do you have any other ideas what might be going on here? As the log messages show in my last update, I've disabled the Performance Analyzer and yet something is still throwing the exception due to the Read-only file system setting.
@prudhvigodithi I've just tested with OpenSearch 2.6.0 and the problem persists. The error messages are the same. Can you provide some guidance on how to overcome this and/or when it might be fixed? Thanks!
I attempted to get around this problem by mounting a couple of emptyDirs (since that had been successful with the similar problem in OpenSearch Dashboards (#368). Unfortunately, while that help eliminate some of the error messages, OpenSearch still wouldn't start.
Just in case it will help the team find the problem, here's what I added:
extraEnvs:
- name: OPENSEARCH_TMPDIR
value: "/tmp/g_opensearch_tmpdir"
extraVolumes:
- name: gtempdir
emptyDir: { }
- name: glogdir
emptyDir: { }
extraVolumeMounts:
- name: gtempdir
mountPath: "/tmp/g_opensearch_tmpdir"
- name: glogdir
mountPath: "/usr/share/opensearch/logs"
I started with just setting the OPENSEARCH_TMPDIR environment variable and defining the gtempdir volume since the initial error message seemed to indicate the problem was related to creating a temp directory. After doing that, I started seeing new messages indicating that the files couldn't be written to the logs directory and the JVM couldn't be started. Here are examples of those error messages:
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
bin/opensearch-cli: line 7: cannot create temp file for here-document: Read-only file system
Exception in thread "main" java.lang.RuntimeException: starting java failed with [1]
output:
[0.000s][error][logging] Error opening log file 'logs/gc.log': Read-only file system
[0.000s][error][logging] Initialization of output 'file=logs/gc.log' using options 'filecount=32,filesize=64m' failed.
error:
Invalid -Xlog option '-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m', see error log for details.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
at org.opensearch.tools.launchers.JvmErgonomics.flagsFinal(JvmErgonomics.java:125)
at org.opensearch.tools.launchers.JvmErgonomics.finalJvmOptions(JvmErgonomics.java:87)
at org.opensearch.tools.launchers.JvmErgonomics.choose(JvmErgonomics.java:70)
at org.opensearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:150)
at org.opensearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:108)
After that, I added the 2nd extra volume (glogdir) and corresponding mount. That eliminated the Java error messages and, I believe, allowed the JVM to start up. But, alas, there still was no joy. The final set of messages looked like this:
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
bin/opensearch-cli: line 7: cannot create temp file for here-document: Read-only file system
/usr/share/opensearch/bin/opensearch: line 62: cannot create temp file for here-document: Read-only file system
At this point, I had to get back to my other responsibilities. I'm hoping some/all of this debugging will help the team develop a solution that allows OpenSearch to be run with readOnlyRootFilesystem set to 'true'.
Any update on this? Seeing the same problem.
Hey @gsmith-sas, readOnlyRootFilesystem
this is coming from the k8s side securityContext, but the OpenSearch application needs to write data to file system in terms of logs as well as the actual data, i'm not sure if you can use a Read-only file system for applications like OpenSearch that constantly writes data to the file system.
However you can use path.logs
and path.data
values in the config.yml
to override the paths and mount the disks in accordingly? The readOnlyRootFilesystem
AFAIK is not applicable to mounted volumes.
@gsmith-sas @rdvansloten can you please share your use case to run OpenSearch on a Read-only file system?
Thank you @bbarani @dblock
@prudhvigodithi Thank you for responding. I'm happy to provide information about my use-case, but it isn't really a use-case issue; this is a Kubernetes security issue. Kubernetes security best practices recommend that pods/containers be configured without access to the underlying Kubernetes node's filesystem. This is generally implemented by setting the readOnlyRootFilesystem property within the pod's securityContext to 'true'. For example, this is recommendation 3.9 (on page 8 of the PDF) in the OWASP Container Security Verification Standard. In fact, security scanning tools, including Microsoft Defender for Cloud, will "flag" instances of containers that are NOT configured with this property set. While some organizations may allow security exceptions to be granted that would permit running in spite of the "flagged" violation, some organizations simply won't permit to use of such software. I suspect the current OpenSearch behavior reflects its origins as a non-containerized application.
I have the same issues and have done the same steps as @gsmith-sas to try to solve it. I did get a bit further but hit other error messes.
@gsmith-sas When I got the error cannot create temp file for here-document: Read-only file system
I did the following:
Set these 2 ENVs
DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI = “true”
DISABLE_INSTALL_DEMO_CONFIG = “true”
Then created this volume with emptyDir
volumeMounts:
- name: tmpfs
subPath: tmp
mountPath: /tmp
But that only gets you to the next error message
Exception in thread "main" org.opensearch.bootstrap.BootstrapException: java.nio.file.FileSystemException: /usr/share/opensearch/config/opensearch.keystore.tmp: Read-only file system
I have't managed to get around this issue as it seems OpenShift wants to create a new keystore file when starting, but the config folder is of course not writable. I checked the config options to see if I could point this file to another folder that I would mount with emptyDir, but the config is only for changing the filename, it will always point to the config folder anyway.
I created this issue for this: https://github.com/opensearch-project/opensearch-build/issues/3991
Here is our workaround in the values.yaml
to make helm chart work.
extraInitContainers:
- name: copy-conf-data
image: busybox
command:
- sh
- -c
- cp -r /usr/share/opensearch/config/* /config/
- chmod -R 777 /config/
- ls /config/
volumeMounts:
- name: configdir
mountPath: /config/
securityContext:
readOnlyRootFilesystem: true
@sandy2008 Can you provide more information about your work-around? I've tried adding that block to my values.yaml file but the OpenSearch wouldn't come up.
I added the following block to my values.yaml file as well (so the configdir would be available to the main container):
extraVolumeMounts:
- name: configdir
mountPath: /config/
And now the pods try to start up...but fail with the message "*cp: can't stat '/usr/share/opensearch/config/': No such file or directory**" in the container log for the new initContainer.
Thanks!
any luck finding the workaround? I am stuck with the same issue.
Hey @gsmith-sas,
readOnlyRootFilesystem
this is coming from the k8s side securityContext, but the OpenSearch application needs to write data to file system in terms of logs as well as the actual data, i'm not sure if you can use a Read-only file system for applications like OpenSearch that constantly writes data to the file system.However you can use
path.logs
andpath.data
values in theconfig.yml
to override the paths and mount the disks in accordingly? ThereadOnlyRootFilesystem
AFAIK is not applicable to mounted volumes.@gsmith-sas @rdvansloten can you please share your use case to run OpenSearch on a Read-only file system?
Thank you @bbarani @dblock
Sorry for the late reply, but ReadOnly FS is a security requirement in the org I was working for. This makes sense, because it's best-practice that the "OS" filesystem inside a container is not modified. This is to prevent attackers from installing or downloading tools to piggyback from a compromised container into the rest of the network, or running malicious workloads inside existing containers (miners, listeners, etc)
I know the use case you're asking is to someone else but I am sort of stuck on the issue. I need readonlyrootfilesystem as the policy set on my k8s AKS cluster. There were couple of issue I encountered like logs , tmp path etc it wasnt able to write but setting volumes to emptyDir: {} solved the issue. Now the next set of issue has to do with config folder /usr/share/opensearch/config. If I map that as well with emptyDir it removes out all the existing files like jvm.options etc but can add all pem, keystore files that gets generated runtime with each docker run. I was wondering if there's a solution for this is to either disable cert generation (I assume thats from security standard and to handle that we using service mesh at cluster level).
I know the use case you're asking is to someone else but I am sort of stuck on the issue. I need readonlyrootfilesystem as the policy set on my k8s AKS cluster. There were couple of issue I encountered like logs , tmp path etc it wasnt able to write but setting volumes to emptyDir: {} solved the issue. Now the next set of issue has to do with config folder /usr/share/opensearch/config. If I map that as well with emptyDir it removes out all the existing files like jvm.options etc but can add all pem, keystore files that gets generated runtime with each docker run. I was wondering if there's a solution for this is to either disable cert generation (I assume thats from security standard and to handle that we using service mesh at cluster level).
There's much more going on in that folder, this was also a show stopper for us. You can't remap that entire folder, sadly. And the certificates are used for node to node authentication, I don't think it cares about what you're doing with mTLS in your mesh.
I was thinking to do something in Dockerfile where all generated files and folders from /usr/share/opensearch/config/ will be shifted in /tmp directory and something in posthook of k8s yaml I will replace them back in /usr/share/opensearch/config/ folder. anything you can suggest around this?
I struggeled with the same but as a workaround I copied all files from config folder and created as separated docker image
FROM busybox
ADD config /config/
RUN mkdir -p /mnt
Then you can add a init container like
initContainers: #
- name: copy-conf-data
image: my-config-image:v1
imagePullPolicy: Never
command: [ "sh", "-c", "cp -rvT /config/ /mnt/" ]
volumeMounts:
- name: config
mountPath: /mnt/
You need then also some extra Volume mount
extraVolumes:
- name: temp
emptyDir: {}
- name: log
emptyDir: {}
- name: config
emptyDir: {}
and connect then with the opensearch image
- name: temp
mountPath: /tmp
- name: log
mountPath: /usr/share/opensearch/logs
- name: config
mountPath: /usr/share/opensearch/config/
Not ideal - but it works with persistence.enabled flag
Thank you, can someone please point me to the latest github repo that has opensearch Dockerfile? I could see 2 repo https://github.com/opensearch-project/OpenSearch/tree/main/distribution/docker/src/docker (not sure how to build it as it has lot of args/vars to be filled-in and other one is https://github.com/opensearch-project/docker-images (this shows error of java.lang.IllegalArgumentException: Could not load codec 'Lucene95'. Did you forget to add lucene-backward-codecs.jar? ) please help.
@sandy2008 Can you provide more information about your work-around? I've tried adding that block to my values.yaml file but the OpenSearch wouldn't come up.
I added the following block to my values.yaml file as well (so the configdir would be available to the main container):
extraVolumeMounts: - name: configdir mountPath: /config/
And now the pods try to start up...but fail with the message "*cp: can't stat '/usr/share/opensearch/config/': No such file or directory**" in the container log for the new initContainer.
Thanks!
Hmmm, we actually got another workaround, which is to do volume mounts directly for the config files, that was also working for us as well.
Thank you, can someone please point me to the latest github repo that has opensearch Dockerfile? I could see 2 repo https://github.com/opensearch-project/OpenSearch/tree/main/distribution/docker/src/docker (not sure how to build it as it has lot of args/vars to be filled-in and other one is https://github.com/opensearch-project/docker-images (this shows error of java.lang.IllegalArgumentException: Could not load codec 'Lucene95'. Did you forget to add lucene-backward-codecs.jar? ) please help.
Here's the link to Docker file used for generating docker images for OpenSearch distribution.
One more way to address this is using emptyDir
with medium: Memory
emptyDir:
medium: Memory
When emptyDir is memory-backed, the volume is backed by a tmpfs
filesystem, which means they will be stored in memory and not on the backing storage of the node.
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
I managed to get the container to start with --read-only by adding the following volume mounts: (prob just the keystore files are required, but the others make config updates easier):
-v ./config/opensearch.yml:/usr/share/opensearch/config/opensearch.yml:Z -v ./config/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:Z -v ./config/opensearch.keystore.tmp:/usr/share/opensearch/config/opensearch.keystore.tmp:Z -v ./config/opensearch-security:/usr/share/opensearch/config/opensearch-security:Z -v ./data:/usr/share/opensearch/data:Z -v ./logs:/usr/share/opensearch/logs:Z
This way, updated containers will retain the original config files except for ones you're likely to edit. (edited to add data/logs which might not be clear as required for read-only)
Describe the bug When I set the readOnlyRootFilesystem key to 'true', the OpenSearch pods cannot be started. The following messages appear in the pod log:
To Reproduce Steps to reproduce the behavior:
I added the following stanza to my user-values.yaml file:
Deploy OpenSearch using Helm (and pointing to the user-values.yaml file)
I can see the OpenSearch pod starting up but fails fairly early with the error messages shown above.
If I remove that stanza, OpenSearch Dashboards comes up without problems.
Expected behavior I expected OpenSearch to come up without problems.
Chart Name I've seen this error with using both OpenSearch version 1.3.5 (Helm chart version 1.14.1) and 2.4.1 (Helm chart version 2.9.0).
Host/Environment (please complete the following information):
Additional context I notice that readOnlyRootFilesystem key is included (on line 294) in the Helm charts values.yml file but commented out.