[RFC] Replace Java Security Manager (JSM)

reta commented 2 years ago

Is your feature request related to a problem? Please describe. It has been announced a while ago that SecurityManager is going to be phased out from the JDK. The first step, the deprecation of the SecurityManager (JEP-411), has been landed in JDK 17 and issues the following warnings on OpenSearch builds or server startup:

WARNING: System::setSecurityManager will be removed in a future release

The JDK 18 pushes it even further and now fails on startup (see please https://bugs.openjdk.java.net/browse/JDK-8270380), running OpenSearch builds or server on JDK 18 EA fails with:

Caused by: java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
    at java.base/java.lang.System.setSecurityManager(System.java:416)

It now requires JVM command line option to enable it explicitly using (see please [1]):

-Djava.security.manager=allow

[x] Support JDK 18 EA builds (https://github.com/opensearch-project/OpenSearch/pull/1710)

Describe the solution you'd like There is no alternative or replacement for the SecurityManager (to understand why, Project Loom is to "blame"), see please [2]. One of the options is to just drop it, it sounds risky but combined with Plugin Sandbox (see please [3], [4]) it may sounds like a viable option. Other options include (but not limited to): bytecode instrumentation, java agent, custom classloader.

Describe alternatives you've considered We could keep it as long as we can, but once removed from the JDK, it will be a problem.

Additional context See please links.

[1] https://inside.java/2021/12/06/quality-heads-up/ [2] https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/ [3] https://github.com/opensearch-project/OpenSearch/issues/1572 [4] https://github.com/opensearch-project/OpenSearch/issues/1422 [5] A possible JEP to replace SecurityManager after JEP 411

reta commented 1 year ago

Sure, for core may be a niche (not advocating to jump on vt), but we have plugins and extensions (upcoming) - and the folks may start using them

pfirmstone commented 9 months ago

I think the only workable solution is to maintain a downstream fork of OpenJDK that retains and improves existing API's. The question is, are there enough interested parties?

Some thoughts drawn from experiences of practical application of least privileged principles:

The book titled, "Inside Java 2 Platform Security, Second Edition, Architecture, API Design and Implementation" by Li Gong, Gary Ellison and Mary Dageforde, ISBN: 0201787911, provides insight into the concepts and thoughts of the original designers.

JEP411 deprecates for removal, Java platform support for Authorization and delegates responsibility to the underlying OS, without proposing any replacement API's to assist developers to do so. The wording of JEP411 confounds and confuses security principles, its criticism of the Java Platform's Security Architecture and API Design can be explained entirely by its complete and total lack of tooling around policy files and poorly maintained and under-developed implementations, which are easily replaced using service providers.

Fundamental principles of security:

Authentication
Authorization
Privacy and Confidentiality
Integrity
Validation

Each compliments the other, remove any one and system security is deficient.

Existing tooling for policy file creation, deployment or maintenance, with replacement Policy and SecurityManager implementations address the majority of problems people experience with Java Platform Authorization.
Java Security Architecture designers provided an extensible security framework, the majority of criticisms relate to the default implementations, which have not been maintained, these use 1990's era coding practices.
Fine grained access controls and authentication is fundamental to the principle of least privilege, discussed in the above referenced book (pages 97 and 138).
Large numbers of ProtectionDomain's are not a detriment to security.
Security policy access checks can be performant and high scaling.
Java's trusted code base is too large; it should be minimalistic, trusted classes need to be loaded during bootstrapping, prior to enforcing security policy.
It is not possible to implement replacement functionality for Java Authorization in application code, low level hooks are required, OpenJDK is planning to remove all support.

After exploring all other options it's my conclusion that the only workable way possible to "Replace Java Security Manager", is to maintain a downstream fork of OpenJDK that retains support for Authorization.

Instead of reimplementing, it is far less work to fix and improve it.

How can Java Platform Security be improved, for the benefit of those who are not security fluent but are required to action it?

Java's trusted codebase must be minimised, in order for permissions to be granted to authenticated principles based on trust. Over the years, since Java 2, the size of Java's trusted platform has ballooned, the trusted platform always has AllPermission and cannot be restricted, meaning there are many platform features that are by default enabled which aren't needed or required, whether or not the user is authenticated is irrelevant if all code on the stack is privileged. The consequences of many historical java vulnerabilities could have been significantly reduced simply by restricting the available attack surface.
All code potentially parsing external data (eg Java deserialization, XML parsing, HTTP, scripts, SQL, text) needs separate ProtectionDomain's that have no permissions by default, currently, these reside in Java's privileged platform code.
Repeating the above, because I can't stress it enough, any code that validates external data must not have any permissions granted by default. External data must be associated with an authenticated data source (Principal), then authorization decisions can be made on whether it is safe to parse that data, based on trust, even when the code parsing is perfectly secure, we mustn't grant it all permission, simply because one day someone clever will find a way to fool that code.
Our policy tooling grants permission to the combination of code and principals, so a user with permission is only authorized when using the code we exercise before deployment, if an attacker substitutes the code, the user account cannot be used to attain any permission. Can you see the problem with granting permission only to users while ignoring code?
If we grant permission to code alone, then the authenticated user is eliminated from the authorization decision, this is the problem with granting permission to code alone.
Best practise is to use the policy generation tool with authenticated users, to ensure authentication is only granted to authenticated users using the code we want them to.
Users generate data, we use our code to validate their input, we authenticate and use privacy, secrecy and integrity to ensure that data hasn't been tampered with.
Developers don't think like attackers, they don't see other potential uses, so user data and developers code must be limited by principles of least privilege.
Gadget attacks are chains of vulnerabilities used to gain access to data or systems.
A fictitious example (with a real gadget, but not the full chain, just one link); let's say your application is using a library, which uses Java Serialization but you don't utilise the code that calls it. You set your Java serialization filters from the command line, but the filters are never initialized because you don't use Java serialization, yes, an unintended consequence of lazy initialization. Let's say that you're using a security policy, and a library has been granted permissions to change properties, because of some sloppy code in your own application that didn't use a doPrivileged call. Perhaps the library has code that parses network data and has the capability to set properties, but you're not using this feature, maybe you're not aware of it, your only using its basic functionality in your application, if that library has a vulnerability an attacker can use to inject and update the java serialization filter properties in a gadget attack, then the attacker can enable Java serialization, because the filters were never initialized, they're not actually in force, the next step is the attacker will try to load some remote classes and attempt to steal certificate identities or other sensitive data. This might sound far-fetched, but attacks like this happen and they happen because code is granted more privileges than it needs, the world hasn't become more friendly in recent years, in fact it's quite the opposite. Java Serialization needs a Permission check and the code implementing serialization needs to be unprivileged. So that when you run the policy tool that creates your policy files, permission to deserialize is never granted because it was never used by your runtime when you were running it in a safe environment to generate your policy files, that you then inspected (to identify undesirable access, eg unintended library calls) and edited (to widen scope where appropriate) prior to deployment.
Permissions need the ability to match patterns or widen scope. For example, an admin wants a policy to restrict IP address ranges using SocketPermission, however it's not possible to specify subnet masks, only domains can be used, but DNS cache poisoning attacks would allow an attacker to redirect it for an MITM. One might restrict IP address ranges, to prevent an attacker from sending information from your local jvm out to some remote location on the internet as part of some side channel attack. IP addresses can be dynamic of course, so you could limit it to local subnets for example and still require TLS authentication, the more effort that's required by an attacker, the more likely script kiddies and would be attackers will look elsewhere for easier pickings.

The principle of least privilege simply doesn't grant any more permission than that which is required to perform specific tasks, policies are updated when new tasks are added.

Something I'd like to mention too, SecureClassLoader (a parent class of URLClassLoader) uses URL as a key in a map, which calls DNS, might this be used in a gadget attack with DNS cache poisoning to load foreign code? In JGDMS there's an RFC3986 URI class loader, that uses normalized URI identity and doesn't call DNS. We have a URL scheme with a SHA256 or SHA512 message digest that checks that the bytes haven't been manipulated prior to class loading, we also have permission checks to prevent unauthorized class loading.

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/RFC3986URLClassLoader.java

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/net/jini/loader/DownloadPermission.java

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-url-integrity/src/main/java/net/jini/url/httpmd/Handler.java

Default CodeSource implies and equals calls also call DNS, we don't use these in our permission checks, we use RFC3986 normalized URI instead.

There are many ways that OpenJDK security could be improved.

https://openjdk.org/jeps/451

We prevent loading agents now using our policy tool, if agents are needed, then why not have authentication and authorization checks in place, the above JEP seems like a brittle solution to me?

https://www.youtube.com/watch?v=uVob-4aXbxY

dblock commented 9 months ago

I really don't see any existing mechanism to move forward away from JSM other than to stop relying on JSM or similar mechanism altogether. If one can install custom code in the cluster, assume that code must be fully trusted. I would rip out JSM without a replacement in a major version breaking change and add the ability to run code remotely or in a separate JVMs as sandboxed or untrusted using the mechanisms introduced in https://github.com/opensearch-project/opensearch-sdk-java.

bbarani commented 7 months ago

@dblock @reta @pfirmstone @uschindler can you please confirm if this change can be included in 2.x without breaking existing API? Basically can this change be added in a backward compatible manner in 2.x line?

We are evaluating if this change warrants a 3.0 release or can be included in 2.x line so need your inputs.

dblock commented 7 months ago

I don't think we have a path forward for this change, but any version of it cannot be included in a 2.x without breaking backwards compat.

kumargu commented 1 month ago

the warnings in JDK-21 are really annoying and seems sooner are the dawn of security manager.

One of the ideas that I wanted to explore was around running untrusted code from plugins within a Web-assembly (WASM/WASI) sandbox. Surely, the hardest [2] and unknown part is "to compile JAVA to WASM".

Firstly, i wanted to hear how does this sound?
There has been now good progress on having a WASM support for native images e.g images generated via Graal VM (with GC specifications). Here's the ongoing work: https://github.com/oracle/graal/issues/3391 https://github.com/oracle/graal/issues/3391#issuecomment-2095322317. If we think this could be one of the possible routes "for replacement of JSM", I can start exploring more..
A shift towards Webassembly will also allow plugins written in memory-safe languages like RUST to be adtoped easily within the Opensearch ecosystem.

reta commented 1 month ago

Thanks @kumargu , we are closely watching [1] as well. WASM aside, GraalVM's sandboxing [2] looks really appealing and it could be worth exploring. On a general note, compiling plugins / extensions to WASM and using them from OpenSearch would largely address the concerns with SecurityManager removal. Thank you.

[1] https://github.com/oracle/graal/issues/3391 [2] https://github.com/opensearch-project/OpenSearch/issues/1687#issuecomment-1545696833

kumargu commented 1 month ago

btw @reta, do you know if everything in GraaVM is free and open-licensed? I have read conflicting answers on internet.

reta commented 1 month ago

btw @reta, do you know if everything in GraaVM is free and open-licensed? I have read conflicting answers on internet.

Yeah, it is super confusing, but the gist of [1] is:

Oracle GraalVM is free to use in production and free to redistribute, at no cost, under the GraalVM Free Terms and Conditions.

[1] https://medium.com/graalvm/a-new-graalvm-release-and-new-free-license-4aab483692f5

uschindler commented 1 month ago

You have to be careful: Lucene is not Graal bullet proof, we don't test it. In addition, all our vector SIMD code is disabled once Lucene detects Graal. The reason is that Graal has no vector incubator support.

reta commented 1 month ago

You have to be careful: Lucene is not Graal bullet proof, we don't test it. In addition, all our vector SIMD code is disabled once Lucene detects Graal. The reason is that Graal has no vector incubator support.

Absolutely, AFAIK we have never tested OS with GraalVM so this area is full of unknowns

opensearch-project / OpenSearch

[RFC] Replace Java Security Manager (JSM) #1687