[BUG] Loading statistics and deploying channels slow on service restart

thorst commented 6 months ago

Describe the bug I have a python script that will clear the statistics, undeploy the channels and then restart the services. When it comes back up it will hang on the loading statistic screen for an unacceptable amount of time (more than 5 minutes) before it finally loads.

To Reproduce I was told previously this was because of statistics, so I added the step to clear current and lifetime stats. I think the issue is that we have 300k pending transactions (the vendor was hacked, so we are just queueing). We use postgres so the database should be able to handle a lot of pending transaction (in my head anyway).

We have 305 channels deployed, they get deployed on system startup. Heap, ram, disk, everything looks perfect.

Running on redhat.

Expected behavior I believe that logging in, deploying channels, and rendering stats should happen quicker, I'm not sure why it is taking so long. Only thing that I know we COULD be doing is that we set channel writers dependent on channel readers being deployed. We dont do that currently, but Ive started discussing it with the team that its something we probably should be doing.

Actual behavior I think while its loading stats, it's still in the process of starting everything up, and the fact that we have a lot of high-volume channels, like ADT, it prolongs it even further. The channels I believe are deployed, but just not showing because gathering stats is expensive.

Mirth Version: 4.2.0 openjdk version "22-ea" 2024-03-19 OpenJDK Runtime Environment (Red_Hat-22.0.0.0.36-1) (build 22-ea+36) OpenJDK 64-Bit Server VM (Red_Hat-22.0.0.0.36-1) (build 22-ea+36, mixed mode, sharing) Red Hat Enterprise Linux release 8.9 (Ootpa)

ab-mg-23 commented 6 months ago

I'll start with a question: What suggested a potentially 300+ thread software solution would have a fast startup/restart time? Threads aren't cheap.

thorst commented 6 months ago

I believe there is room for improvement possibly, but the primary issue was that we were having a disk array issue causing io errors and slowing everything down.

thorst commented 6 months ago

I'll start with a question: What suggested a potentially 300+ thread software solution would have a fast startup/restart time? Threads aren't cheap.

Is there a "max" recommended number of interfaces per server? Our env runs with low cpu/ram, but perhaps we have too many interfaces on this server?

pacmano1 commented 6 months ago

There isn't that I know because it wildly depends on what your channels do, message volume, back end config, etc. Before I wrote all code as multi-tenant, I have had engines with 1500 channels.

thorst commented 4 months ago

Part of the issue may be that our db is 6 tb

kirbykn2 commented 4 months ago

Does your db need to be that large? Are you pruning messages in a timely manner? Are you storing attachments multiple times?

Only storing was is absolutely necessary in Mirth will help performance. Long term storage is cheap. Prune content and store it outside of the db, if needed.

On Thu, Jul 11, 2024 at 12:50 PM Todd Horst @.***> wrote:

Part of the issue may be that our db is 6 tb

— Reply to this email directly, view it on GitHub https://github.com/nextgenhealthcare/connect/issues/6186#issuecomment-2223421587, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRXWD634NW45QZUVE7R6DLZL2ZVFAVCNFSM6AAAAABHCLMHUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGQZDCNJYG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Best,

Kirby Knight | 231.735.4650 | @.***

thorst commented 4 months ago

I agree with this thought @kirbykn2 however the majority of our channels are set to prod mode, with 45 day retention. We would like to keep it at 45 or even more.i just put a ticket in for advanced message storage configuration. #6255 this will help reduce what is stored.

How do others implement long term storage? In that ticket I documented a couple thoughts on long term storage but I'm not sure what others are doing.

kirbykn2 commented 4 months ago

In general, I try to keep incoming raw messages for 2 weeks in Mirth. In addition, we have a process that saves all messages to S3 as they pass through Mirth. There are flows in our environment where messages pass through 2 - 3 channels, so it doesn't make sense for those channels to have the same storage options. As an example, if I have a channel whose only purpose is routing to other channels, I will set the message storage to raw (or lower). To reduce overhead and increase performance.

On Thu, Jul 11, 2024 at 3:48 PM Todd Horst @.***> wrote:

I agree with this thought @kirbykn2 https://github.com/kirbykn2 however the majority of our channels are set to prod mode, with 45 day retention. We would like to keep it at 45 or even more.i just put a ticket in for advanced message storage configuration.

How do others implement long term storage?

— Reply to this email directly, view it on GitHub https://github.com/nextgenhealthcare/connect/issues/6186#issuecomment-2223769844, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRXWD4S3SJWMVIHOGXAL5LZL3OQDAVCNFSM6AAAAABHCLMHUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTG43DSOBUGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Best,

Kirby Knight | 231.735.4650 | @.***

thorst commented 4 months ago

For the s3 sync, is that a code template or post processor, or you send the raw message to a separate channel?

kirbykn2 commented 4 months ago

I've seen it done both with a code template and a channel dedicated to writing to S3. I prefer a code template you can call from anywhere.

On Thu, Jul 11, 2024 at 5:07 PM Todd Horst @.***> wrote:

For the s3 sync, is that a code template or post processor, or you send the raw message to a separate channel?

— Reply to this email directly, view it on GitHub https://github.com/nextgenhealthcare/connect/issues/6186#issuecomment-2223941316, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRXWD5SAH3KBAJC4Z6SDR3ZL3X2FAVCNFSM6AAAAABHCLMHUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE2DCMZRGY . You are receiving this because you were mentioned.Message ID: @.***>

-- Best,

Kirby Knight | 231.735.4650 | @.***

nextgenhealthcare / connect

[BUG] Loading statistics and deploying channels slow on service restart #6186