simjanos-dev / LinguaCafe

LinguaCafe is a self-hosted software that helps language learners read foreign languages.
https://simjanos-dev.github.io/LinguaCafeHome/
GNU General Public License v3.0
847 stars 26 forks source link

Docker deployment issues #59

Closed sergiolaverde0 closed 7 months ago

sergiolaverde0 commented 7 months ago

This issue is here to keep track of all problems regarding building the images and running the contsiners. Issues regarding features, dependencies, networking and other should not be reported here.

Issues that happen only in the dev branch should mention it explicitly.

simjanos-dev commented 7 months ago

Yes, it feels great. Felt even better when I pushed a hotfix, and there were no changes in the docker process. :) But... this might change it a bit:

86 suggested to add the possibility to change MySQL username and password, which I agree. I could mount the .env laravel file as well, and add a git pull command to the update process, which would work great.

But now if someone modifies their config, and there is a new update and run git pull, it would fail with an uncommited changes error message. I could add a command that reverts their changes before the pull, but then they would need to change their config each time.

Any suggestions how to handle this?

simjanos-dev commented 7 months ago

I found out that there are 11.723 svg files that area missing from the software, and needed for drawing Chinese characters on the Kanji page. The total size of them is 50MB.

Should I just upload them to the main GitHub repository, or should we put a command in the dockerfile to download them while building the image?

Answer:

Assuming the source where they are hosted is reliable downloading them is the way to go, but naturally keep a copy handy to use as a backup.

I wanted to upload it to GitHub, but then realized that it would have to be packaged somehow into 1 file. Since people will now use the deploy branch for cloning and other branches are only used for development, should I just add it to the main branch and package it with the docker image?

sergiolaverde0 commented 7 months ago

I will have to do some research in regards of the MySQL password, my first thought is to ship a .env file with the default and reference the variable in the composefile so that they are always in sync. I barely know Laravel so I might need your help in looking how to make that container read the environment variable.

As for the SVG files, compressing them into a .rar or .7zip and then releasing that file should be fine since people are used to downloading and uncompressing files. To bake it into the image do curl or wget to download and place them into the right location. There is also the possibility of adding a button in the admin panel to do the download at runtime so image doesn't get bigger for people that study other languages.

simjanos-dev commented 7 months ago

Oh, I didn't know docker has .env file as well, I meant a laravel .env file. Laravel has a .env file in the root directory, that contains the default MySQL username and password. I can just move it to the deploy branch and mount it that way, then people can edit both the .env and docker-compose.yml file to change the password. But then there will be problems with updating that I mentioned in my previous comment.

sergiolaverde0 commented 7 months ago

We can probably attempt to save their progress with git stash before the pull and then git stash pop after the pull to apply their changes again, as long as they don't conflict with our own.

Not like there's much for the users to change now that we ship a default folder hierarchy.

simjanos-dev commented 7 months ago

I don't know, that seems like it could break based on what the users modify, but somehow I can't find examples in my head for what would be a conflict. There are 2 things that users should be able to modifying:

And as long as they only modify these 2 files, the update should work 100%. It's more difficult than I thought.

sergiolaverde0 commented 7 months ago

Oh my plan is not to mount the .env file to any container, but to use it as a single source of truth to keep all containers in sync. The environment variables will be passed to Laravel via docker exactly like they are currently passed to MySQL.

simjanos-dev commented 7 months ago

I think I'm missing something. Why does the main image is being built for 27 minutes, and the dev images only 7 minutes?

Sometimes main images built in 7 minutes as well. Does github limit cpu/network speed if I build too many images?

(Node update took it down a few minutes in the last dev image)

sergiolaverde0 commented 7 months ago

The dev image builds faster because it only works for x64 which is much faster than the Arm64 one; I mentioned this when discussing the need for an action to build a test image but can't remember where did that happen.

And then there is also layer caching. An image is composed of several layers stacked together, where each layer corresponds with a step or command in the Dockerfile. When building a new image old layers that have not had any changes are reused while layers with changes and all that come after are processed again.

simjanos-dev commented 7 months ago

I'm thinking about modifying the dev environment. I want to have my containers' name changed to have a -dev at the end of them, so I can run 2 instances of LinguaCafe. One for development and one for personal use. Right now I have to export and import my database whenever I want to change it. I think I will add the container names to the .env file as well. It will be have to be readable somehow by both Laravel and Python, because linguacafe-python-service is referenced in the code as domain names so the containers can communicate with eachother. I think I can just read the environment variable simply both with python and laravel.

sergiolaverde0 commented 7 months ago

Simply changing their names does not really work as of today, importing an ebook fails as you might expect. To be sure users don't tweak something that will break their deployment, I won't be adding those options until the changes are applied to the code.

simjanos-dev commented 7 months ago

Is the docker-compose.yml file in the main branch used at all? If not, can I delete it?

I won't be adding those options until the changes are applied to the code.

I think I'll just add it to the docker-compose-dev.yml file, so it won't mess up production. I'll use a CONTAINER_NAMES=dev variable, and I add it in the dev dockerfile instead of .env so it 100% won't mess up anything.

Also, I've just realized that Laravel is running in development mode(APP_ENV=local, APP_DEBUG=true in the .env file). I'll change it to production in the future.

simjanos-dev commented 7 months ago

I have modified the dev environment. I've pushed it to dev, built an image and made a fresh install. It works well both in dev environment and with the dev image.

Could you please also take a quick look at it, to make sure I didn't mess up anything?

sergiolaverde0 commented 7 months ago

Everything at commit 865f854 seems fine but I have noticed while looking at the diff that Django has been running too in debug mode, in case you might want to turn that off on main. For these cases I think you might benefit from creating a new testing branch where you can mess around with things you don't want merged into main, such as running Django and Laravel in debug mode.

simjanos-dev commented 7 months ago

Yes, I've noticed it too. Also just removed a line the other day that printed out every tokenized word.

Will fix these issues someday. Also the python should be separated into two files: one for tokenizing, and one for importing.

This thing got released a bit too early...

simjanos-dev commented 7 months ago

I have uncommented the spacy model load lines, and my used RAM did not change in docker. I assume maybe it only loads the models into RAM when it needs them. Is 1.8GB a normal amount of ram to use for running these services?

sergiolaverde0 commented 7 months ago

For all three containers? Sounds more or less reasonable given that includes a MySQL container, a Vue front-end and a double back-end with Laravel and Django (I don't know how smart the later is with loading unused elements)at the same time.

simjanos-dev commented 7 months ago

I've used Django because I didn't find a simpler http server with post data handling. Wasn't a problem for personal use, but I'll look around and see if I can find something smaller.

sergiolaverde0 commented 7 months ago

I have been eyeing Flask as a more lightweight replacement for that use case. To be honest I have also thought of unifying it all on Django for a simpler architecture but that's a lot more work.

simjanos-dev commented 7 months ago

I've found Bottle, I think I'm going to test it, and use it if it works. It really doesn't need any advanced functionality. Just reading and and responding to basic post and get requests. It just connects python only libraries to laravel.

I wanted to go with the built in python http library, but it couldn't handle post data. The more lightweight the better.

What do you mean by unifying it on Django? Rewriting the Laravel part in Django?

sergiolaverde0 commented 7 months ago

Yes, since Spacy introduces Python as a hard dependency it makes sense to use it for the whole back-end, and that would be a bit easier than coordinating two different frameworks running on separate containers. I also blame Laravel running as it's own user for many of the issues we had with docker.

Don't take this too seriously though, something like this needs serious pondering and most problems have already been dealt with to some degree.

simjanos-dev commented 7 months ago

Python service is using 1.4GB memory, while Laravel and the database use less than 400MB together. Even after commenting out the spacy model load lines.

sergiolaverde0 commented 7 months ago

Well that needs some serious attention, I will check what is going on once I get home.

simjanos-dev commented 7 months ago

Thank you! It's not something extremely important to fix, so take your time. I am trying out the Bottle framework now. If it works I'll make a dev image with it and test it for a while.

Docker seems to be stable now finally, and there are no problems with updating. Feels so nice.

simjanos-dev commented 7 months ago

Okay. So after a few hours of playing with Bottle, I got it running. LinguaCafe went down from 1.7GB to 350MB ram usage... I've seen it go up to 1.3GB, but after restarted and tested a bit, now it stays at 432MB.

I'll make a commit in dev in a few hours, and test it for a while.

simjanos-dev commented 7 months ago

It still goes up to 1.1GB, but the Python server runs at fixed 430MB now.

simjanos-dev commented 7 months ago

I think I'm going crazy, It is using 1.5GB total now, Python is still much lower than the original 1.4 I saw.

It still needs some testing, I will push the new python server to dev tomorrow.

simjanos-dev commented 7 months ago

I've replaced Django with Bottle, and added dynamic spacy model loading. I've tested it for over an hour with every language. Now with a few languages, the python container's RAM usage can stay below 400Mb. If I import a book or use a language with a larger language model(dependencies), it starts using more RAM. I've also noticed that the other two containers use more ram while importing an e-book, and it does not go down afterwards.

I am building the dev image now, but I have only tried it on my dev environment so far, so it's possible that it won't work. I'll test the dev image in a few days. And also will test it again with all language imports before v0.7.

I think I'll might just add those 10~ missing spacy languages now, because they will not increase performance requirement for people who do not use them, and it only increase the python image size by 100MB~ total.

I've experimented with using different image for the python docker than ubuntu:22.04. Did not replace it with anything this time, but I think this is an overkill. I think it could also be moved to the webserver service image.

Edit: For the dev image you also have to replace command: /app/manage.py runserver 0.0.0.0:8678 with command: "python3 /app/tokenizer.py" for it to work.

I've tested the dev image. Ram usage went down, but sadly it did not solve my performance problem. My laptop froze for 10~ minutes after pulling and deploying this image. :/ Someone else also tried it #98, who said it got better.

sergiolaverde0 commented 7 months ago

At some point I attempted to replace the Ubuntu base with an Alpine one but installing dependencies failed for the bigger models. One of the Debian based Python images might be an improvement, but not a very big one.

simjanos-dev commented 7 months ago

Yeah, I've noticed that the language models dependencies also take up almost 1GB, so there is not much room for improvement. I think the node_modules folder on the php image can be deleted after running the prod script, that would save 160MB. Also another close to 100MB with the unused font variations for Chinese and Japanese. Didn't even notice that they are that big.

sergiolaverde0 commented 7 months ago

Checked StackOverflow and according to 53947626 is safe to delete node_modules. TIL.

simjanos-dev commented 7 months ago

I don't know if it's a smart idea or not, but maybe NodeJS and Composer could also be removed once they did their job? I think all these changes could remove hundreds of MB.

simjanos-dev commented 7 months ago

I removed the unnecessary files and the node_modules directory and made a dev image. The webserver image size went down from 1.12GB to 846.57MB. I'm not sure if we should remove Node/npm and composer as well or not.

The official Laravel Sail docker image is ~670MB with an ubuntu image. I don't think there is much more room left to optimize.

Docker seems to be working well and stable thanks to you @sergiolaverde0 . :)

sergiolaverde0 commented 7 months ago

I'm not sure if we should remove Node/npm and composer as well or not.

Currently the composer binary gets removed after installing dependencies so that part is already done. I have been searching if we can safely remove Node but get too much noise so I might just test it on own and see if something breaks or not.

simjanos-dev commented 7 months ago

Thank you! As long as the compiled app.js file is there, nothing should break.

sergiolaverde0 commented 7 months ago

Does mix need any of the PHP dependencies to compile the app.js? If not, we can use a multi-stage build where we compile the file in a separate container and just copy the file over to the final one.

simjanos-dev commented 7 months ago

It seems to be built on top of webpack, so I don't think so.

If you have time and want to do it, could you please finish the docker versioning in February? I don't know how much work it is. I will probably release v0.7 around end of February.