mbentley / docker-omada-controller

Docker image to run TP-Link Omada Controller
696 stars 126 forks source link

[Feature]: Lighter weight Java runtime like OpenJ9 #373

Open paulschmeida opened 10 months ago

paulschmeida commented 10 months ago

What problem are you looking to solve?

Since a lot of people run this software for their home networks with just a couple of APs, having a docker container with full-blown vesion of Java seems like an overkill.

Describe the solution that you have in mind

I propose the switch to a lighter wait version of JRE, one that's built specifically for containers. Now bear in mind I'm not a Java developer, but I've been told that OpenJ9 from IBM Semeru for example is more lightweight and can run the same java apps at 1/3 memory footprint and lower CPU usage.

Additional Context

No response

mbentley commented 10 months ago

I can't say I know anything about OpenJ9/IBM Semeru so I can't really be sure about comparisons to OpenJDK JRE 17, which we are using today. My main concerns would be things like:

  1. Supportability - is TP-Link going to give me any grief if I have problems with the controller and I am not running an Oracle JRE or OpenJDK JRE?
  2. Extra maintenance - is it worth the extra effort adding in the scripting to install a different JRE? Installing OpenJDK JRE is just a pre-packaged .deb in the Ubuntu repos which is simple where I would need to get the latest version of the OpenJ9 package, install it and probably do some environment configuration to get it working, and that is assuming that they don't decide to change anything about the packaging randomly in the future. It's not a huge deal but something I wouldn't have to worry about otherwise.
  3. Benefit - is there enough of a benefit running an Omada Controller with a different JRE to warrant the extra effort? I'd need to see some sort of proof of concept to show that it is worth the effort. Even just a hacky branch that gets it installed and can show startup time benefit and longer running resource benefit would be good. It's also helpful to keep in mind that MongoDB is in the standard image so when doing resource comparisons, it would probably be a good idea to keep the MongoDB separate to more easily compare apples to apples with the two JREs.

I can put trying to put together a proof of concept together but it'd be something that would be on the back burner for me in all honesty. I know what I am getting with OpenJDK JRE and I know the support lifecycle because it's packaged with Ubuntu.

ktims commented 8 months ago

I had a go at this on https://github.com/ktims/docker-omada-controller/tree/openj9-testing, initial results are promising, I see more than 45% reduction in container memory utilization. It seems to work fine though I haven't tested it extensively or with a 'real' workload.

OpenJDK:

$ podman run --name openjdk --network=host --rm -it docker.io/mbentley/omada-controller
# wait for application to be ready for login, go through setup wizard, log in
$ podman stats
ID            NAME        CPU %       MEM USAGE / LIMIT  MEM %       NET IO      BLOCK IO    PIDS        CPU TIME      AVG CPU %
393872374d1f  openjdk     0.77%      1.722GB / 50.43GB  3.42%       0B / 0B     0B / 0B     231         1m35.085021s  131.58%

OpenJ9:

$ podman run --name openj9 --network=host --rm -it 1eddddeef383ebc8cac7c546e9c8653d96da03ace7a6709530fccd85d738f99a
# ...
$ podman stats
ID            NAME        CPU %       MEM USAGE / LIMIT  MEM %       NET IO      BLOCK IO    PIDS        CPU TIME     AVG CPU %
17ae8110009e  openj9      0.85%       864.9MB / 50.43GB  1.72%       0B / 0B     0B / 0B     235         1m1.456071s  59.71%

What's more, OpenJ9 is aware of container memory restrictions, so if I'm really rude to the container and only give it 512m RAM, it can be even more aggressive:

$ podman run -m 512m --name openj9 --network=host --rm -it 1eddddeef383ebc8cac7c546e9c8653d96da03ace7a6709530fccd85d738f99a
# ...
# podman stats
ID            NAME        CPU %       MEM USAGE / LIMIT  MEM %       NET IO      BLOCK IO    PIDS        CPU TIME      AVG CPU %
e8ed45b24ee7  openj9      2.65%       337.6MB / 536.9MB  62.87%      0B / 0B     0B / 0B     258         1m42.314089s  89.80%
mbentley commented 8 months ago

Interesting. Do you have example install code so I could take a look as well? Significant memory reduction could be really interesting, especially for the lower powered systems a lot of people tend to run this on.

Also, I'd like to do some tests with MongoDB and the controller running as separate processes to get metrics from the only the controller to remove that variable.

ktims commented 8 months ago

Not sure what you mean about install code, everything you need to try it should be in my openj9-testing branch. I made the following modifications:

That's all that was required to get it up and running.

OpenJ9 definitely feels subjectively slower when it's cold, but once warmed up it actually seems to outperform OpenJDK based on pageload timing, which is pretty surprising to me.

Test based on the docker-compose.yaml in my branch (but I built 5.13 with NO_MONGODB), running with the 512MB memory constraint for OpenJ9 (which gives it more memory than before, alongside mongodb) and separate mongodb. After a few minutes of poking around the interface of both instances:

ID            NAME            CPU %       MEM USAGE / LIMIT  MEM %       NET IO             BLOCK IO    PIDS        CPU TIME      AVG CPU %
2d9a29df8835  mongodb2        0.49%       195.7MB / 50.43GB  0.39%       751.6kB / 1.86MB   0B / 0B     38          4.329065s     1.58%
b4b69f290cf3  omada_original  0.59%       1.585GB / 50.43GB  3.14%       1.911MB / 760.3kB  0B / 0B     185         1m50.125153s  40.33%
ba19b583b94e  omada_openj9    0.62%       348.5MB / 536.9MB  64.91%      1.873MB / 753.7kB  0B / 0B     219         1m55.460964s  42.44%
eb521ccda5ce  mongodb         0.50%       193.2MB / 50.43GB  0.38%       759kB / 1.898MB    0B / 0B     38          4.284706s     1.56%
bartong13 commented 7 months ago

This looks very promising for us 'home' users. I have an RPI4 2GB which I use to host a few other containers and sits about 1GB usage, but the current full-OpenJDK image just pushes it too far and hits OOM issues. Limiting the container memory to 800MB stops the OOMs but the controller software becomes unusable. So I'm running on a desktop host instead, which is not ideal because I would prefer to use a low power device so the controller can be left running 24/7.

If @ktims testing is anything to go by I would be able to run the OpenJ9 image and still have a bit of memory spare.

Apologies I cannot assist with development, but I am happy to assist with testing it in a 'production' environment if it gets to that stage (1 switch and 3 EAPs with approx 30 clients max)

mbentley commented 7 months ago

Sorry for the lack of progress so far. I've added this to my backlog of things to look at further.

jinkazph commented 7 months ago

Sorry for the lack of progress so far. I've added this to my backlog of things to look at further.

Nice.. Good to hear..

mbentley commented 7 months ago

I started some work on a custom base image for OpenJ9 because I prefer to have consistency & control over the ability to patch the base image. It's nothing crazy (Dockerfile for this), just taking the OpenJ9 images, grabbing the JRE + the share classes cache and putting it in an image (on Docker Hub). OpenJ9 only has arm64 and arm64 builds available but I don't really see that as a problem as the armv7l images are already doing something different as it is today. My builds really aren't doing anything different than the ibm-semeru-runtimes images but I can quickly patch the underlying ubuntu:20.04 image this way with minimal effort and not having to really build anything.

I hope to get a chance to test this out later today with an actual build so I can do some comparisons myself. If I do, I'll update here and probably put up a branch of what I have.

mbentley commented 7 months ago

OK, so I have a branch https://github.com/mbentley/docker-omada-controller/tree/openj9 that seems to work (comparison from master).

I just built a couple of test images for amd64 and arm64: mbentley/omada-controller:5.13-openj9test-amd64 mbentley/omada-controller:5.13-openj9test-arm64

bartong13 commented 7 months ago

Thanks @mbentley if we run a container from this image can it retain the same volumes as a container that was running the OpenJDK image? Or would it be better to set this up as a "fresh" container and use the controller migration within the omada controller software to move devices across instead?

mbentley commented 7 months ago

It should be fine but I would make sure to take backups before hand (you should be taking regular backups anyway - autobackups are built into the controller software unless you haven't enabled them). Keep in mind, this isn't merged into master as I need to do some more testing so there may be some changes but I intend of them to not be breaking changes.

bartong13 commented 7 months ago

Yeah for sure, always backup haha.

Do you envisage having two images once this is work is complete? A "full" image plus a "lite" image, so to speak? Or are you actually thinking you'll permanently switch to using OpenJ9 going forward?

mbentley commented 7 months ago

I did some comparisons between the normal OpenJDK, OpenJ9, and OpenJ9 with -Xtune:virtualized in terms of resource consumption:

# without -Xtune:virtualized on OpenJ9
$ docker stats --no-stream omada-controller omada-controller-oj9
CONTAINER ID   NAME                   CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
8f7348b05774   omada-controller       1.04%     1.519GiB / 125.5GiB   1.21%     9.42kB / 4.85kB   860kB / 33.6MB   179
13f45ffff47d   omada-controller-oj9   1.35%     783.1MiB / 125.5GiB   0.61%     0B / 0B           0B / 24.5MB      160

# with -Xtune:virtualized on OpenJ9
$ docker stats --no-stream omada-controller omada-controller-oj9
CONTAINER ID   NAME                   CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
8f7348b05774   omada-controller       1.53%     1.512GiB / 125.5GiB   1.21%     9.42kB / 4.85kB   860kB / 31.2MB    181
74da808fd8d3   omada-controller-oj9   1.40%     750.1MiB / 125.5GiB   0.58%     9.45kB / 4.92kB   20.1MB / 26.2MB   170

For JVM startup times on a clean install:

So startup times were pretty close to identical on my server.

mbentley commented 7 months ago

Do you envisage having two images once this is work is complete? A "full" image plus a "lite" image, so to speak? Or are you actually thinking you'll permanently switch to using OpenJ9 going forward?

Ideally, I would like to not add additional images to build as I am currently building 26 images just for the Omada Controller versions that are "supported". Add in those that are "archived" (I don't build these daily), it's 65. This includes amd64, arm64, armv7l, and the versions with chrome for report generation.

My thought being that OpenJ9 is basically an extended OpenJDK so I would hope that there are no regressions or complications from switching.

bartong13 commented 7 months ago

I did some comparisons between the normal OpenJDK, OpenJ9, and OpenJ9 with -Xtune:virtualized in terms of resource consumption:

Did you have any 'load' on the controllers during these tests? ie: Were there actually any devices adopted into the controllers, any traffic flowing on the devices under their control, etc?

mbentley commented 7 months ago

Did you have any 'load' on the controllers during these tests? ie: Were there actually any devices adopted into the controllers, any traffic flowing on the devices under their control, etc?

No, this was just for overall startup of a brand new controller, no devices under management. Wasn't yet ready to try anything with my own instance yet.

mstoodle commented 7 months ago

For JVM startup times on a clean install: OpenJDK - 1:14 OpenJ9 - 1:15 OpenJ9 w/-Xtune:virtualized - 1:16

You may want to try populating your own shared classes cache in your image build step rather than copying the one from the original containers (assuming I understood what you wrote earlier about creating your own containers). If you do an application startup inside the build step, it should create a custom shared classes cache for your app inside your container that can then start your container more quickly. Even better if there is a way to run some load in that build step, because then you'll be able to get JIT compiled code cached right into your container image (if you keep that -Xtune:virtualized option). Hopefully then you'll see some improvement in the startup times with OpenJ9, and if you don't there are some diagnostics we could look into to try to understand why.

Great to see people getting value from Eclipse OpenJ9 !! Best of luck!

mbentley commented 6 months ago

Thanks @mstoodle for the tip! That's correct, I am just pulling the shared classes cache from the Docker Hub image so it sounds like I have some playing around to do to see what might be possible to optimize the cache. I have limited information about the software app itself.

If you don't mind me asking, one thing I would be curious about would be if one approach could be to use a shared classes cache that is persistent in a directory outside of the container that is read/write. The first startup of the app wouldn't be optimized but I would image that subsequent startup and running of that app may? Is that an approach worth investigating or would that be an anti-pattern? Just curious as getting this app started as a part of a build step might introduce some complexity on my end that might be a bit funky and resource intensive considering this app needs and auto-starts MongoDB as well. I'll also see if I can locate the documentation on shared classes caching works as I will admit I haven't even looked yet.

mstoodle commented 6 months ago

Hi @mbentley . You can configure the cache to reside on a docker volume if you like, but it gets troublesome to manage (at least in general deployments; not sure if the complexity would be there in your case). But there are advantages to having the cache inside the container. You can make the prepopulated cache layer read-only which speeds up the access to it. If you have people who build image layers on top of yours, they can add their own layer to the shared cache too (it's designed to work alongside how docker layers work).

Quick references to the shared cache doc: https://eclipse.dev/openj9/docs/shrc/ or, if you prefer blogs you can look at some here: https://blog.openj9.org/tag/sharedclasses/ . If you have questions, you can @ mention me here and I'll try to respond.

IAmKonni commented 5 months ago

Do you have a recent version of this test image? mbentley/omada-controller:5.13-openj9test-amd64 gives me 5.13.23 and not the latest 5.13.30 stable release. I would like to gove it a try. Maybe I can help you with that. Some years ago I was a Java developer. :)

mbentley commented 5 months ago

Sorry, I haven't had to chance to follow up on anything further but I was able to build some new images using the latest version just now:

mbentley/omada-controller:5.13-openj9test - multi-arch (auto-selects amd64 or arm64) mbentley/omada-controller:5.13-openj9test-amd64 - amd64 specific tag mbentley/omada-controller:5.13-openj9test-arm64 - arm64 specific tag

I've done no further testing on them yet but I assume they start up :)

IAmKonni commented 5 months ago

Switched to this test image today and no problems so far.

image

jinkazph commented 5 months ago

Stable also for me. Already using it for a month..

mbentley commented 5 months ago

At this point, I would like to work on better understanding a shared classes cache pattern that makes the most sense for how the app runs in a container. I see a lot of experimentation in the future to make that happen.

eblieb commented 4 months ago

I am pretty new to docker, to run the openj9 container I would just replace the container name from the default in the build command with the new one and keep all the port allocates and everything the same?

mbentley commented 4 months ago

I am pretty new to docker, to run the openj9 container I would just replace the container name from the default in the build command with the new one and keep all the port allocates and everything the same?

Correct. And to be clear, I am manually building this image right now so it's not getting automatic updates at the moment. I don't expect there to be issues but just FYI. Make sure you're taking regular backups of your persistent data.

eblieb commented 4 months ago

I am actually running it on a Raspberry Pi 4 and running the omada SDN on bare raspbian, so going to save that microsd card as the backup. Just wanted to give docker another go (was having issues with your docker image and performance) which is why I went to a bare install. Looks like the memory issue was what was causing it.

On Wed, May 1, 2024 at 9:51 AM Matt Bentley @.***> wrote:

I am pretty new to docker, to run the openj9 container I would just replace the container name from the default in the build command with the new one and keep all the port allocates and everything the same?

Correct. And to be clear, I am manually building this image right now so it's not getting automatic updates at the moment. I don't expect there to be issues but just FYI. Make sure you're taking regular backups of your persistent data.

— Reply to this email directly, view it on GitHub https://github.com/mbentley/docker-omada-controller/issues/373#issuecomment-2088495406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4D4LPBIRGRK63DCFETHULZADXO5AVCNFSM6AAAAAA7R5PD4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGQ4TKNBQGY . You are receiving this because you commented.Message ID: @.***>