mbentley / docker-omada-controller

Docker image to run TP-Link Omada Controller
682 stars 125 forks source link

[Feature]: Update Ubuntu base image and MongoDB version #443

Open lucianor opened 1 month ago

lucianor commented 1 month ago

What problem are you looking to solve?

TP-Link FAQ for Omanda SDN Controller install on Linux suggests Ubuntu 22.04 as the base system, with MongoDB 7 as suggested for the database.

The image is using Ubuntu 20.04 with the built-in MongoDB version 3.68. I found this while investigating a fix for

It would be interesting to provide a more current tech stack with either jammy or noble as the default, and with a more modern and currently supported MongoDB version. Reasons for this change:

1) focal supports ends in less than a year - Apr 2025 and is no longer maintained 2) jammy gets access to newly updated OpenJDK tech stack and also updates to any potential security fixes in Java 3) MongoDB 3.68 is entirely deprecated 4) TP-Link may at any time upgrade the requirements for running Omada SDN Controller, increasing the risk of a making a bigger jump in technology, therefore making it hard to update the controller. 5) Having the latest thing

Describe the solution that you have in mind

Additional Context

It would be important to start this upgrade with MongoDB upgrade, given that you have to walk version to version to get to a more modern version. I'm facing challenges running 4.0 on an Ubuntu noble install due to outdated libraries.

mbentley commented 1 month ago

So there are certainly a number of reasons why I have held off upgrading to at least 22.04 - the first and the biggest elephant in the room being that MongoDB 3.x is no longer supported by MongoDB. There are at least still some level of security updates from Canonical with it being packaged in the official repos for 20.04 (focal) so it's basically at a dead end support wise.

As for making the jump to noble/24.04 - that's not supported according to the FAQ so 22.04 would be where the base OS upgrade would need to end up.

The good news about JDK is that focal, jammy, and noble all have OpenJDK 8, 11, 17, and 21 available. Comparing the versions from focal to noble, they're both running 17.0.11+9-1 so they're the same. I'm currently already using OpenJDK 17 for amd64 and arm64 which TP-Link said will work fine for the controller - I have no yet seen confirmation about OpenJDK 21 though. I am guessing one of the main reason they have yet to move to Spring Boot 3.x is that they would probably need to move to OpenJDK 21 which they may have issues with (not sure, haven't tested that) but it certainly eliminates older operating systems that don't provide a version of OpenJDK 21 so that could be why.

As for MongoDB 7, I don't see any reference to MongoDB 7 being the suggested version, at least not on the US version of the FAQ (https://www.tp-link.com/us/support/faq/3272/). It mentions: "Omada SDN Controller supports MongoDB v3 and v4. Here we will show how to install v4.4." The bad news is that even from MongoDB directly, MongoDB 4.4, which is the oldest version that TP-Link supports, went EOL on February 2024.. I will have to see if I can get someone on the TP-Link forums to provide some guidance around future MongoDB versions and see if they have any info on future support of newer versions.

lucianor commented 1 month ago

100% with you on all the above. jammy looks like the go-to base image for now.. Similarly, OpenJDK 17 seems fine as a choice, and the upgrade gap is not big... But MongoDB is definitely the elephant in the room. I'm fine tuning / re-using your code from here https://github.com/mbentley/docker-omada-controller/tree/mongodb-upgrade-to-4.4 to make it all the way to 7.0, but putting the migrate script on top of the current stack, with the base image upgrade and a check for the mongoDB version as a stretch goal. Why 7.0? It looks like the FAQ for my region and Canada are different than the US and Global version - https://www.tp-link.com/ca/support/faq/3272/ and says that is supported. If we are making users run an upgrade script, it may be wise to use the latest version supported to reduce the likelihood of doing it again in the future.

mbentley commented 1 month ago

Ah I see - so it's starting with 5.14.20 and above that they mention that MongoDB 7 is supported. Must just not have made their way through the other FAQs. 100% with you there on only wanting to do one migration of the database if I can help it.

The thing that really sucks is that I do not want to continually ship all of the MongoDB binaries forever until people upgrade so I would be leaning toward doing a specific migration image that the user would have to run manually. The MongoDB binaries and dependencies are about 1 GB which is just a deal breaker for me.

I am working on updating the migrate.sh script to add the additional migrations. It doesn't look too bad at face value so we'll see.

mbentley commented 1 month ago

OK, so arm64 was a pain in the ass because they didn't have ideal versions but this seems to work:

https://github.com/mbentley/docker-omada-controller/tree/mongodb-upgrade-to-7.0/mongodb_upgrade

At least the migrations run for both amd64 and arm64. I haven't yet tested to make sure that I can install MongoDB 7.x and have the controller start up but this is all I can probably work on for today.

The basic run commands are:

docker run -it --rm -v omada-data:/opt/tplink/EAPController/data mbentley/omada-controller:migrate-amd64

docker run -it --rm -v omada-data:/opt/tplink/EAPController/data mbentley/omada-controller:migrate-arm64

I didn't publish any of the images yet so they have to be built.

mbentley commented 1 month ago

Another thing would be that moving to a newer version of MongoDB would completely drop support for armv7l (which I would NOT be mad about at all).

mbentley commented 1 month ago

Here is a WIP branch: https://github.com/mbentley/docker-omada-controller/tree/update-base-and-mongo

amd64 and arm64 build commands:

docker build \
  --pull \
  --build-arg INSTALL_VER="5.14" \
  --build-arg ARCH="amd64" \
  --build-arg BASE=mbentley/ubuntu:22.04 \
  --progress plain \
  -f Dockerfile.v5.x \
  -t mbentley/omada-controller:5.14-22.04-mongodb7-amd64 \
  .

docker build \
  --pull \
  --build-arg INSTALL_VER="5.14" \
  --build-arg ARCH="arm64" \
  --build-arg BASE=mbentley/ubuntu:22.04 \
  --progress plain \
  -f Dockerfile.v5.x \
  -t mbentley/omada-controller:5.14-22.04-mongodb7-arm64 \
  .

And two images published to Docker Hub:

At least I will be able to use these to test if the migration scripts worked.

lucianor commented 1 month ago

https://github.com/mbentley/docker-omada-controller/tree/mongodb-migrate-to-7.0/mongodb_upgrade

Tested this with my current Omada 5.13 / MongoDB 3.68 on both amd64 and arm64. Both upgrades went fine:

INFO: Migration to 7.0.12 complete!

I did try mbentley/omada-controller:5.14-22.04-mongodb7-arm64 but got the dreaded #418 bug.. I tried beta-5.14.20.9 as well, same issue. I also tried rebuilding the image with 5.13.30.8 but that does not work with MongoDB 7.0.

mbentley commented 1 month ago

🤦 bummer. I will see if I can test it myself soon.

*edit: oh who am I kidding - just tested it and it looks like I can start the controller for both the arm64 and amd64 images after running a migration.

One thing I am not sure about that I will need to validate is if there is any good reason to actually run a repair before migrating. I'm guessing no and that could speed up the migration by a good bit.

Another thing to add from a feature standpoint would be somehow making the entrypoint intelligent enough to detect if you are running the 22.04 based image with MongoDB 7 when you have not run a migration. I know that MongoDB will just fail to start but I would like to just spit out an error message that tells people to follow instructions somewhere to perform the migration.

mbentley commented 1 month ago

OK, I just made a pretty big update to the migration script in the branch to have cleaner output - all of the MongoDB logs are being put to a file. Repair should only be needed if the db was shut down uncleanly. Maybe that should be a first step before doing the first upgrade, just to make sure but this output is much more easy to understand if the upgrade was successful:

INFO: executing upgrade process from MongoDB 3.6 to 7.0...

INFO: starting upgrade to 4.0...
INFO: starting mongod 4.0.28....done
INFO: setting feature compatibility version to 4.0...done
INFO: verifying feature compatibility version has been updated to 4.0...done
INFO: stopping mongod...done
INFO: upgrade to 4.0 complete!

INFO: upgrading from libcurl3 to libcurl4
Selecting previously unselected package libcurl4:arm64.
dpkg: considering removing libcurl3:arm64 in favour of libcurl4:arm64 ...
dpkg: yes, will remove libcurl3:arm64 in favour of libcurl4:arm64
(Reading database ... 4570 files and directories currently installed.)
Preparing to unpack .../libcurl4_7.58.0-2ubuntu3.24_arm64.deb ...
Unpacking libcurl4:arm64 (7.58.0-2ubuntu3.24) ...
Setting up libcurl4:arm64 (7.58.0-2ubuntu3.24) ...
Processing triggers for libc-bin (2.27-3ubuntu1.6) ...

INFO: starting upgrade to 4.2...
INFO: starting mongod 4.2.23....done
INFO: setting feature compatibility version to 4.2...done
INFO: verifying feature compatibility version has been updated to 4.2...done
INFO: stopping mongod...done
INFO: upgrade to 4.2 complete!

INFO: starting upgrade to 4.4...
INFO: starting mongod 4.4.18.....done
INFO: setting feature compatibility version to 4.4...done
INFO: verifying feature compatibility version has been updated to 4.4...done
INFO: stopping mongod...done
INFO: upgrade to 4.4 complete!

INFO: starting upgrade to 5.0...
INFO: starting mongod 5.0.27.....done
INFO: setting feature compatibility version to 5.0...done
INFO: verifying feature compatibility version has been updated to 5.0...done
INFO: stopping mongod...done
INFO: upgrade to 5.0 complete!

INFO: starting upgrade to 6.0...
INFO: starting mongod 6.0.16.....done
INFO: setting feature compatibility version to 6.0...done
INFO: verifying feature compatibility version has been updated to 6.0...done
INFO: stopping mongod...done
INFO: upgrade to 6.0 complete!

INFO: starting upgrade to 7.0...
INFO: starting mongod 7.0.12.....done
INFO: setting feature compatibility version to 7.0...done
INFO: verifying feature compatibility version has been updated to 7.0...done
INFO: stopping mongod...done
INFO: upgrade to 7.0 complete!

INFO: Fixing ownership of database files...done

INFO: upgrade process from MongoDB 3.6 to 7.0 was successful!
mbentley commented 1 month ago

So just thinking about the migration/upgrade script further, there are some additional things I would like to do:

And then in the actual controller image once it's updated:

I am sure I am forgetting about all sorts of scenarios that need to be addressed but it is good to think through earlier than later.

mikepell007 commented 1 month ago

Just as an FYI in case some one else runs into this. Trying out the new image with MongoDB7 with no luck. mongod process would crash. Turns out my CPU does not support the avx instruction set. This seems to be a requirement for mongodb 7.

https://www.mongodb.com/community/forums/t/fresh-install-result-in-crash-at-startup/264498

mbentley commented 1 month ago

Oh yikes. Good catch on that. I will have to see if I can add a check pre-upgrade to make sure that AVX is present although I imagine that the rollback process I'm working on should work if it fails to start.

mbentley commented 1 month ago

OK, I just pushed some changes to the upgrade branch:

lucianor commented 1 month ago

Should the AVX requirement be treated as a requirement for running the controller? As described in the README vs scripted requirement? Not sure if docker containers has access to /proc so you could grep -o 'avx[^ ]*' /proc/cpuinfo to tell if AVX is available... Or stick with MongoDB 6...

mbentley commented 1 month ago

According to https://www.mongodb.com/docs/manual/administration/production-notes/#x86_64, the AVX requirement was added in MongoDB 5.x so anyone who can't run AVX basically can't run a supported MongoDB. AVX became available in 2011 and I am not sure how long it really took before it really became broadly adopted but that's almost 13 years ago at this point where both Intel and AMD had it first available in a released CPU. Regardless, MongoDB 6.x goes EOL July 2025 so that's not really a long runway.

I would say that AVX will then become a requirement for running the controller so it should be both in the README and scripted to prevent a ton of issues being filed because it doesn't start.

/proc/cpuinfo is available inside the container:

# docker exec -it omada-controller grep ^flags /proc/cpuinfo | grep -m 1 ' avx '
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
mikepell007 commented 1 month ago

According to https://www.mongodb.com/docs/manual/administration/production-notes/#x86_64, the AVX requirement was added in MongoDB 5.x so anyone who can't run AVX basically can't run a supported MongoDB. AVX became available in 2011 and I am not sure how long it really took before it really became broadly adopted but that's almost 13 years ago at this point where both Intel and AMD had it first available in a released CPU. Regardless, MongoDB 6.x goes EOL July 2025 so that's not really a long runway.

In my case, it happens to be an Atom C3758 (Denverton family) so not too old. I'm guessing for the Atom format they removed some lesser used features to make it smaller/cooler/more power efficient/etc.

lucianor commented 1 month ago

OK, I just made a pretty big update to the migration script in the branch to have cleaner output - all of the MongoDB logs are being put to a file. Repair should only be needed if the db was shut down uncleanly. Maybe that should be a first step before doing the first upgrade, just to make sure but this output is much more easy to understand if the upgrade was successful:

I tried building a Docker image locally from this folder and running the migration script. It didn't work, output below. I was unable to locate the logs as you mentioned. The previous version of the script worked.

INFO: running pre-flight checks on MongoDB...done
INFO: creating a backup (mongodb-preupgrade.tar) of MongoDB pre-upgrade...done

INFO: executing upgrade process from MongoDB 3.6 to 7.0...

INFO: starting upgrade to 4.0...
INFO: starting mongod 4.0.28...about to fork child process, waiting until server is ready for connections.
forked process: 14
ERROR: child process failed, exited with error number 14
To see additional information in this output, start without the "--fork" option.

ERROR: unexpected failure; aborting MongoDB upgrade and rolling back!
INFO: cleaning up mongo.pid...done
INFO: rolling back to the backup of MongoDB prior to the upgrade...done
INFO: the MongoDB backup file (mongodb-preupgrade.tar) is still in your persistent data directory in case you need it
INFO: successfully rolled back MongoDB using the pre-backup archive
mbentley commented 1 month ago

Ah so this is part of the tricky part at the moment. The MongoDB logs are now being sent to /tmp/upgrade_log.txt inside the container. I should figure out how to handle those. Easiest might be to actually change the location to somewhere under the persistent data directory just so that the migration logs are available because most people shouldn't be running an interactive shell to do the migration.

I will update it to go to /opt/tplink/EAPController/data/mongodb_upgrade.log. Curious if I should output like the last 20 lines of the logs or something when an error occurs like this. Thoughts?

edit: I pushed a commit that moved the logs to the persistent data location as mentioned above. edit2: also added the last 30 lines of logs, if present.

mbentley commented 1 month ago

According to https://www.mongodb.com/docs/manual/administration/production-notes/#x86_64, the AVX requirement was added in MongoDB 5.x so anyone who can't run AVX basically can't run a supported MongoDB. AVX became available in 2011 and I am not sure how long it really took before it really became broadly adopted but that's almost 13 years ago at this point where both Intel and AMD had it first available in a released CPU. Regardless, MongoDB 6.x goes EOL July 2025 so that's not really a long runway.

In my case, it happens to be an Atom C3758 (Denverton family) so not too old. I'm guessing for the Atom format they removed some lesser used features to make it smaller/cooler/more power efficient/etc.

Ah yeah, that's unfortunate. I have an Intel Atom C3558 which is my pfsense firewall and it also does not have AVX support but obviously it being my pfsense, I don't use it to run an Omada Controller.

This makes me really wish that I had analytics about the controllers that run the image to better understand about them but that gets into an area I haven't been in before - having to deal with privacy of collecting data.

lucianor commented 1 month ago

I will update it to go to /opt/tplink/EAPController/data/mongodb_upgrade.log. Curious if I should output like the last 20 lines of the logs or something when an error occurs like this. Thoughts?

If this is a container with the sole purpose of upgrading the database, I would bring the your treated output to the docker logs, and have the full logs into the file. Also, I figured the mapping of the data folder by looking into the script - I assume you are using the same data bindings as if it was a controller container.

mbentley commented 1 month ago

Sorry, I will add a README with better, consistent instructions as yes, the data is mapped the same as in the controller container.

One thing I could do to remove some of the noise from the logs upon error is to have an MongoDB log for each version and just dump the entirety of the logs. A lot of the logs were previously spam from the repair when I had that enabled so not doing a repair helps with that. Or I could just remove --logappend and it will always start with a fresh log each time MongoDB starts. I really don't care about the logs after a successful upgrade from one version to the next.

I also just realized that another thing I didn't ever do was verify that there was actually data in the ./data/db directory before starting the upgrade.

mbentley commented 1 month ago

On another note, I was looking at the requirements for arm64 as well. It says "MongoDB on arm64 requires the ARMv8.2-A or later microarchitecture." The Cortex-A72 is what is used in the Raspberry Pi 4 so even if it is running arm64, it might not work. Also see this rpi forum thread about the topic.

MongoDB incompatibilities are going to be a huge pain. I feel like I will end up maintaining the 20.04 based images for a lot longer than I was hoping along side updated 22.04 + MongoDB 7 😞

Pretty sure I have a Raspberry Pi 4 around here somewhere that I can take and do some testing on. My guess is that if it isn't supported, it will have a failure to start 5.x so hopefully the rollback logic still handles that use case.

mbentley commented 1 month ago

There is now a README with instructions. The TODO list is at the top of the upgrade.sh script.

lucianor commented 1 month ago

Awesome, will give it another try on my Orange Pi 5 arm64 and on Intel Pentium Gold G7400 amd64 architecture. On ARM requirements, I Orange Pi 5 (RK3588s) is running MongoDB 7.0 with no issues on Ubuntu noble. I can test your script into a spare Raspberry Pi 4 2gb I have laying around - just let me know.

mbentley commented 1 month ago

Awesome, will give it another try on my Orange Pi 5 arm64 and on Intel Pentium Gold G7400 amd64 architecture. On ARM requirements, I Orange Pi 5 (RK3588s) is running MongoDB 7.0 with no issues on Ubuntu noble. I can test your script into a spare Raspberry Pi 4 2gb I have laying around - just let me know.

Ah yeah, it looks like the RK3588S is a Cortex-A76 which has arm 8.1 & 8.2 extensions so that could explain why it works fine. I also have a few Raspberry Pi 3 B+ boards that I can test on as well since the Cortex A53 (rpi 3b+) and Cortex A72 (rpi 4) have the same instruction set (ARMv8-A) The wikipedia spec table for the Raspberry Pi devices has some good details there.

lucianor commented 1 month ago

Followed README. I got the repair error, may be because that my omada-controller was not gracefully shutdown:

2024-07-29T13:51:51.891+0000 F STORAGE [initandlisten] An incomplete repair has been detected! This is likely because a repair operation unexpectedly failed before completing. MongoDB will not start up again without --repair.

Maybe you could add back the repair to prevent this issue, at least from the first step of migration? This would prevent this from plain out stopping.

mbentley commented 1 month ago

Thanks for the feedback. I think it's definitely worth the delay of having to run the repair to ensure this will work for most people out of the box. I just pushed a change to the branch so that it will perform a repair before the first upgrade to make sure it won't run into this issue.