invoiceninja / dockerfiles

Docker files for Invoice Ninja
https://hub.docker.com/r/invoiceninja/invoiceninja
GNU General Public License v2.0
398 stars 264 forks source link

Docker in k8s, testing, slow initial page load (including page refresh) #547

Closed BloodyIron closed 10 months ago

BloodyIron commented 10 months ago

I'm tying up the loose ends of spinning up Invoice Ninja for the first time in kubernetes/k8s/containerisation, and I've overcome generally most hurdles, except the initial page load.

For TESTING purposes, I have adjusted the running config to remove variables and possible bottlenecks. Such that I may, in the permanent configuration, want to run my database as a separate deployment (which is how I was aiming to do it initially, also using mariadb btw). However, I currently have that mothballed and have spin up a sidecar (I think that's the appropriate term here) within the pod of a mysql:5 database (example source here: https://github.com/invoiceninja/dockerfiles/issues/94#issuecomment-761763006 ). To be clear I have not blindly copied and pasted all of that script, so I don't have the cron stuff running, for example.

The sidecar mysql does not have any PV/PVC mounted, and this is intentional, as I wanted to eliminate any storage latency in my testing, since I'm trying to identify the initial page load problem. So there is ZERO valuable data (production data) in this current running space (it's in a VM elsewhere, yes with backups).

The problem is that on initial page load (login prompt) and if I refresh any page after logging in, the waterfall shows a "Waiting for server response" time in the 3-4 second range. But once that happens everything else loads quickly. Unless I need to refresh the page and load again.

The underlying infrastructure here is no slouch, and considering there is effectively zero data in the database used for this test setup, I do not yet see why this is slow.

When I go to a "production" class web service within the same kubernetes cluster, in this case running Jellyfin, the "Waiting for server response" value is 59ms... MILLISECONDS. This is literally using the same ingress NGINX, the same MetalLB load balancer, the same NAS serving data, etc, that is serving the problematic invoice ninja system (testing).

So... what's going on here? I don't have any CPU/RAM limits set in my deployments for invoice ninja, or anything like that. What am I missing?

BloodyIron commented 10 months ago

For the sake of testing purposes I'm grasping at every straw that comes into reach. Namely because pinpointing the actual cause is... not making sense.

I just found a possible relevant detail. So far I've been using invoiceninja:latest as the tag, but I just tried rolling back to 5.0.x(oldest) and moving forward, while keeping the oldest (highest value) for the minor minor version...

I just tested with 5.2.19 and the "waiting for server response" at login is 230ms ish. WAYY FASTER! So something along the lines (unsure which version) really slowed it down... I may stick to this version, not yet sure, but the plot thickens... 🤔🤔🤔

Also, once I've logged in, refreshing the page (f5) is similarly faster than before. BY A LOT! (same waiting time though as described in the immediately above paragraph).

BloodyIron commented 10 months ago

Yeah so the fastest one I can find is v5.2.19...

v5.3.highest / v5.4.highest leads to login pages that never work (http default hyperlinks when https only is present I think) and I can never log in.

v5.5 and higher all have the slow loading issue, and this looks to be a poor architectural design choice. I don't know where the issue lies, but the loading times are completely non-workable. Having to wait 2+ seconds just to load the page initially, when earlier editions are literally fractions of that, is just absurdly ridiculous.

That being said, I could be somehow doing something wrong, sure. But like... not only is the dockerhub page for these images grossly not updated at all, I can't for the life of me (I'M ON DAY 4 OF NON STOP TESTING BY THE WAY) figure out what I'm doing wrong...

At this point it's not storage performance, be it the database or underlying storage otherwise, it's not CPU or not enough resources.. the only thing that has had positive performance impact is switching to invoiceninja:5.2.19 ... NOTHING ELSE HAS SPED THIS PART UP AT ALL.

So... yeah... I'm going to go with that version at this point until I figure out wtf I'm doing wrong, or something else changes to make latest versions not suck.

Welcome to input.

turbo124 commented 10 months ago

@BloodyIron is this only slow on k8s?

Is this loading the Flutter UI? or the React UI?

What is the browser console showing? is it a delay in TTFB or something else?

for reference, is the loading time comparable to https://demo.invoiceninja.com or https://react.invoicing.co

BloodyIron commented 10 months ago

@turbo124 I don't have any I-Ninja v5 stuff going on apart from this particular k8s scenario. Namely because I've only recently made the effort to upgrade.

As for Flutter / React, I can't reliably tell which for each version... when observing discussions on that topic I get the impression the React UI is immature (or something like that) so I didn't make any effort to really try one over the other. Really just the default that was presented.

The issue isn't the loading time, it's the whole waiting for server time. As that aspect blocks any assets being downloaded by the browser until the first response is given. Which results in a really really long perceived load time (since nothing visibly changes for well over 2 seconds).

As for your specific URLs to test... (testing in incognito, naturally)

Both of those example URLs are head and shoulders a superior experience to the out-of-the-box experience I've had with any version higher than v5.2.19.

The browser console (in my testing for versions higher than v5.2.19) did not show any output until the first "waiting for server response". That may constitute the TTFB but I'm not entirely sure.

turbo124 commented 10 months ago

It sounds like you are not running queues? this would block the UI until all actions are completed.

https://invoiceninja.github.io/en/self-host-installation/#supervisor-for-invoice-ninja-ubuntu-22-04-lts

BloodyIron commented 10 months ago

@turbo124 that link is for the Ubuntu install though, and I don't know what aspects are specifically relevant to the Dockerhub image... I have recently set "QUEUE_CONNECTION" to "database" and wiped the permanent data for both invoiceninja and the mariadb (still in testing step so no actual data loss, that's on another server with backups, waiting for me to finalise this and do migration).

BloodyIron commented 10 months ago

I've also reviewed the video linked in the documentation for docker : https://www.youtube.com/watch?v=xo6a3KtLC2g and I don't see anything about Queue stuff being changed or... "needing" changing... so while you may be right I don't have any actual idea what to do about that aspect.

BloodyIron commented 10 months ago

The image log even shows that queues are running:

2023-11-01 20:34:22,019 INFO success: queue-worker_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Wed, Nov 1 2023 2:34:22 pm | 2023-11-01 20:34:22,019 INFO success: queue-worker_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Wed, Nov 1 2023 2:34:22 pm | 2023-11-01 20:34:22,020 INFO success: scheduler entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

And yet it is still slowwwwww for new page loads or refreshes.

BloodyIron commented 10 months ago

The container even shows these two processes running by default:

64 invoicen 0:02 php artisan queue:work --sleep=3 --tries=1 --memory=256 --timeout=3600 65 invoicen 0:02 php artisan queue:work --sleep=3 --tries=1 --memory=256 --timeout=3600

I think there's a DRASTIC lack of knowledge and documentation on the docker image for InvoiceNinja. Not only is the dockerhub page for it many years out of date (it talks about PHP7.2... for example), but the official documentation links to a video made by someone else, that clearly doesn't even cover "everything needed" to get a responsive instance running for the docker image above v5.2.19, and there's seemingly no documentation for the kubernetes aspect (without helm, as I don't use helm). So yeah still having to grasp at straws so much here :/

turbo124 commented 10 months ago

Try spinning up the standard docker-compose.yml file in this repo to see how it performs on your hardware compared to your current k8s setup.

You'll also want to compare the env variables in this repository with the ones you are currently using. It sounds like your QUEUE_CONNECTION variable is sync, and not database/redis which is the most likely explanation for your performance issues.

BloodyIron commented 10 months ago

Can you please re-open this matter? This is NOT resolved. And I do not have a docker-compose environment at all ready to simulate this, that's the whole point of my kubernetes+CI/CD set up, so I can rapidly iterate in there... I really don't think kubernetes is an unacceptable environment to work against here.

QUEUE_CONNECTION is set to database. I confirm this as the running env within the container, and the configuration of the app itself. It is the same speed both before (unset) and after (set to database)

And... again.. this issue does not exhibit itself for version v5.2.19

BloodyIron commented 10 months ago

@turbo124 can we please have this reopened? This is not resolved.

sugarfunk commented 9 months ago

Just to add, I don't think this is an isolated issue. I am also having this. In fact, it seems to have been worse the last 3 weeks. I have Uptime Kuma monitoring the site. I am up to 60,000ms timeout with 3 tries and still getting notifications the the login page is down.

It can take up to and over 60 seconds sometimes (resulting in cloudflare 504/524). However, once it does load, either interface runs fine to great. It's just initial loading.

(Running default/latest docker, tried NPM, SWAG, and Traefik proxies, run composer install, php artisan optimize/migrate (and force), etc. I'm at a loss at this point. I have been using the app for 4-5 years I think. White label user etc. not sure what else to do.

Oh and the apps do work fine as well. But my staff doesn't always have access.

BloodyIron commented 9 months ago

I have burned an exhaustive amount of time the last few weeks chasing performance issues with initial page loads for the internal page, as well as the client portals. This is conclusively an application performance issue. As in, bad code or perhaps bad method. I have other more complex PHP tools loading oodles faster than Invoice Ninja in Docker form.

Initial page load times is in the realm of 2-5 seconds. And when traversing the Client Portal between the sections, each click is about 2-4 seconds. These load times are completely unacceptable from a modern web application perspective.

Yes, the internal section is very quick once it is loaded, switching between sections, but the initial load for both internal and client areas is not even close to okay.

bavarian-ng commented 6 months ago

Hi @BloodyIron,

I'm not sure how your Kubernetes environment looks like, but we had exactly the same issues as you (loading time initially took for ages and I found your issue during research). I just found out that the cause in our environment it is actually not the Application but it was the Load Balancer. We use EKS with an Amazon Application Load Balancer in front.

Which brought me on the right track was that there were lots of requests in the nginx logs with status code 499. At the moment it helped to increase the ALB HealthCheck Timeouts and setting the target port fixed to the NodePort used by the Service with annotations to our Ingress:

alb.ingress.kubernetes.io/healthcheck-interval-seconds: "60"
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "20"
alb.ingress.kubernetes.io/healthcheck-port: "PORT"

This already improved the start loading time from >30 Seconds to 6 seconds.

At the moment I'm experimenting also with the ALB Idle timeout. I add more details if I have a final state which is working better in our environment.

Maybe this is is also the problem on your side and helps you in debugging (I'm happy to hear if this helps also in your environment and discussing possible improvements).

BloodyIron commented 2 months ago

Hi @BloodyIron,

I'm not sure how your Kubernetes environment looks like, but we had exactly the same issues as you (loading time initially took for ages and I found your issue during research). I just found out that the cause in our environment it is actually not the Application but it was the Load Balancer. We use EKS with an Amazon Application Load Balancer in front.

Which brought me on the right track was that there were lots of requests in the nginx logs with status code 499. At the moment it helped to increase the ALB HealthCheck Timeouts and setting the target port fixed to the NodePort used by the Service with annotations to our Ingress:

alb.ingress.kubernetes.io/healthcheck-interval-seconds: "60"
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "20"
alb.ingress.kubernetes.io/healthcheck-port: "PORT"

This already improved the start loading time from >30 Seconds to 6 seconds.

At the moment I'm experimenting also with the ALB Idle timeout. I add more details if I have a final state which is working better in our environment.

Maybe this is is also the problem on your side and helps you in debugging (I'm happy to hear if this helps also in your environment and discussing possible improvements).

ALB is irrelevant as this is 100% on-prem self-hosted on RKE1 nodes. The LB is currently MetalLB in this relevant environment.

BloodyIron commented 2 months ago

@turbo124 this problem persists please re-open this bug report. Closing this really was unwarranted in the first place.