rbeckman-nextgen / test-mc3

0 stars 0 forks source link

Administrator times out on its first few requests after being idle for a while #3112

Closed rbeckman-nextgen closed 4 years ago

rbeckman-nextgen commented 4 years ago

Only after a short time of inactivity (as little as 20-30 min), MC times out and takes several minutes to reconnect to the server and be responsive to be usable.

I've observed the issue in MC 3.x on multiple Mirth appliances of different customers such as QHNPRODMC1, QHNPRODMC2, HECPRODMC1, etc. This issue is not present in MC 2.x versions.

See below errors related to the timeout:

java.net.SocketException: Operation timed out com.mirth.connect.client.core.ServerConnection.executePostMethod(ServerConnection.java:118) com.mirth.connect.client.core.Client.getChannelTags(Client.java:204) com.mirth.connect.client.ui.Frame.retrieveAllChannelTags(Frame.java:4795) com.mirth.connect.client.ui.Frame.doShowDashboard(Frame.java:1786) sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) sun.reflect.misc.Trampoline.invoke(MethodUtil.java:57) sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260) java.beans.EventHandler.invokeInternal(EventHandler.java:463) java.beans.EventHandler.access$000(EventHandler.java:261) java.beans.EventHandler$1.run(EventHandler.java:411) java.security.AccessController.doPrivileged(Native Method) java.beans.EventHandler.invoke(EventHandler.java:409) com.sun.proxy.$Proxy8.actionPerformed(Unknown Source) org.jdesktop.swingx.action.BoundAction.actionPerformed(BoundAction.java:231) javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2028) org.jdesktop.swingx.JXHyperlink.fireActionPerformed(JXHyperlink.java:264) javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2351) javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387) javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242) javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:236) java.awt.AWTEventMulticaster.mouseReleased(AWTEventMulticaster.java:272) java.awt.Component.processMouseEvent(Component.java:6414) javax.swing.JComponent.processMouseEvent(JComponent.java:3275) java.awt.Component.processEvent(Component.java:6179) java.awt.Container.processEvent(Container.java:2084) java.awt.Component.dispatchEventImpl(Component.java:4776) java.awt.Container.dispatchEventImpl(Container.java:2142) java.awt.Component.dispatchEvent(Component.java:4604) java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4618) java.awt.LightweightDispatcher.processMouseEvent(Container.java:4279) java.awt.LightweightDispatcher.dispatchEvent(Container.java:4209) java.awt.Container.dispatchEventImpl(Container.java:2128) java.awt.Window.dispatchEventImpl(Window.java:2492) java.awt.Component.dispatchEvent(Component.java:4604) java.awt.EventQueue.dispatchEventImpl(EventQueue.java:717) java.awt.EventQueue.access$400(EventQueue.java:82) java.awt.EventQueue$2.run(EventQueue.java:676) java.awt.EventQueue$2.run(EventQueue.java:674) java.security.AccessController.doPrivileged(Native Method) java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86) java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:97) java.awt.EventQueue$3.run(EventQueue.java:690) java.awt.EventQueue$3.run(EventQueue.java:688) java.security.AccessController.doPrivileged(Native Method) java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86) java.awt.EventQueue.dispatchEvent(EventQueue.java:687) java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296) java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211) java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201) java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196) java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188) java.awt.EventDispatchThread.run(EventDispatchThread.java:122)


java.net.SocketException: Operation timed out com.mirth.connect.client.core.ServerConnection.executePostMethodAsync(ServerConnection.java:222) com.mirth.connect.client.core.Client.getChannelStatusList(Client.java:963) com.mirth.connect.client.ui.Frame$17.doInBackground(Frame.java:2300) com.mirth.connect.client.ui.Frame$17.doInBackground(Frame.java:2296) javax.swing.SwingWorker$1.call(SwingWorker.java:277) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) javax.swing.SwingWorker.run(SwingWorker.java:316) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695)


Imported Issue. Original Details: Jira Issue Key: MIRTH-3221 Reporter: chrisr Created: 2014-04-08T09:51:26.000-0700

rbeckman-nextgen commented 4 years ago

These are from back in February:

{quote}4:24:34 PM Chris Ro: DanS told me he's been having the same issue with MC 3.x 5:07:20 PM Nick Rupley: Hmm, I haven't noticed that, what version? 5:07:34 PM Chris Ro: 3.0.1.7051 5:10:52 PM Nick Rupley: I haven't seen that, at least not that it happens any more often than 2.x 5:10:59 PM Nick Rupley: How often is "a while"? 5:11:19 PM Chris Ro: I think as short as an hour of inactivity 5:11:30 PM Chris Ro: i get it every morning with MC 3.x instances 5:11:44 PM Chris Ro: I click something, and it hangs until the error pops up 5:11:49 PM Chris Ro: and then it operates normally 5:12:53 PM Nick Rupley: That may be expected, if you leave it overnight or something 5:13:17 PM Chris Ro: I go through this typically a couple of times a day because I don't attend to all MC instance 5:13:21 PM Nick Rupley: When you come back to it and start doing something again, it has to create new HTTPS connections, because the ones you were using are no longer valid 5:13:30 PM Chris Ro: Overnight is fine, but not during the same afternoon 5:13:51 PM Nick Rupley: Yeah, that should be fine… hmm 5:14:03 PM Chris Ro: just wondering what the timeout value is set to 5:20:44 PM Nick Rupley: I think the connection timeout is 10 seconds, but 5:20:58 PM Nick Rupley: The max idle time for HTTPS connections is set to 24 hours 5:21:02 PM Nick Rupley: However, 5:21:12 PM Nick Rupley: If there are at least 200 connections on the box 5:21:45 PM Nick Rupley: Then that idle time is reduced to a maximum of 200 seconds 5:23:42 PM Chris Ro: I am certain that I'm the only one accessing the MC, and there's only one channel (which is a DB reader) that is active 5:24:01 PM Chris Ro: I don't think it's related to having 200 connections in this case 5:25:39 PM Chris Ro: but anyway, just wanted to bring this up because I've been having this issue consistently since 3.x, and thought maybe this could be addressed in a rather simpler way. 5:25:52 PM Chris Ro: I don't have this issue with 2.x instances 5:26:01 PM Chris Ro: I'll keep a close watch 5:26:29 PM Nick Rupley: Hmm, does that even happen with a local MC instance? 5:27:31 PM Chris Ro: good question. I can't remember. 5:28:35 PM Chris Ro: I'll check tomorrow. I can't confirm right now 5:28:41 PM Nick Rupley: Hmm, okay 5:29:47 PM Chris Ro: I'll let you know tomorrow 5:30:00 PM Chris Ro: but I'm definitely having the issue with remote 3.x instances 5:30:08 PM Chris Ro: in fact, I just had it again just now 5:30:12 PM Chris Ro: for another 3.x instance{quote}

{quote}3:03:33 PM Chris Ro: by the way, regarding the MC timeout issue, I don't have it with my local MC 3.0.1 instance 3:03:41 PM Chris Ro: only with the remote instances 3:03:57 PM Chris Ro: at least I didn't have it with my local instance today 3:04:27 PM Chris Ro: but I did have the same errors multiple times with remote instances 3:26:06 PM Nick Rupley: Hmm okay 3:26:11 PM Nick Rupley: Just talked with Wayne about it 3:27:20 PM Chris Ro: what have you found out? 3:27:52 PM Chris Ro: It could be related to certain hosted appliances or environments 3:28:02 PM Nick Rupley: We're not quite sure what could be the cause yet 3:28:20 PM Nick Rupley: Do you have an appliance or something that you can reliably reproduce it on? 3:28:21 PM Chris Ro: oh you were able to reproduce the issue? 3:28:29 PM Chris Ro: yea [REDACTED] appliances 3:29:00 PM Nick Rupley: I wasn't able to reproduce it 3:29:12 PM Nick Rupley: But we'll try with whatever appliance you're able to reproduce it on 3:29:43 PM Nick Rupley: How long do you have to let the Administrator sit before it happens? 3:29:45 PM Chris Ro: I'm seeing the issue daily on this box for example: 3:29:47 PM Chris Ro: [REDACTED] 3:30:02 PM Chris Ro: I don't know to be exact. I'm guessing maybe 2 hrs? 3:30:05 PM Nick Rupley: And "daily" as in first thing in the morning when you first try to access it? 3:30:14 PM Nick Rupley: Or several times throughout the day 3:30:36 PM Chris Ro: first thing in the morning. and then at least a couple more times throughout the day 3:31:08 PM Nick Rupley: And when you let it sit, you're not like, putting your workstation on standby/sleep/hibernate are you? 3:31:19 PM Nick Rupley: Because if you do, that could definitely cause connections to expire 3:31:50 PM Chris Ro: no. other than the screensaver 3:32:16 PM Nick Rupley: Mkay, well we'll try it out and see 3:33:03 PM Chris Ro: ok. it's not urgent. just a bit of inconvenience. take a look when you have some time. 3:33:15 PM Nick Rupley: Will do, thanks{quote}

Imported Comment. Original Details: Author: narupley Created: 2014-04-08T11:06:11.000-0700

rbeckman-nextgen commented 4 years ago

Still haven't been able to reproduce this. I left the Administrator (connected to one of the servers reported to have the issue) open all night. Came in this morning, everything worked perfectly fine.

Imported Comment. Original Details: Author: narupley Created: 2014-04-09T08:01:07.000-0700

rbeckman-nextgen commented 4 years ago

Chris can you give a little more information about what page/view you're on when it happens? Like, do you leave it on the Dashboard view (so that it's continually refreshing)? Or are you somewhere else like the Channels view? Or does the issue happen no matter where you're at?

Imported Comment. Original Details: Author: narupley Created: 2014-04-10T11:53:13.000-0700

rbeckman-nextgen commented 4 years ago

I typically leave MC on the Channels view, or on the Code Templates view. When I leave for half hour and come back and click on any button, typically "Channels" button to go to the Channels view, I see MC being unresponsive until these 'Operation timed out' pop up errors.

This afternoon, I left MC on the Dashboard view intentionally and came back about an hour later, and I did not see the issue happen. Could this be because of the continual auto refresh on the Dashboard view, keeping the MC client connected to the server without timing out?

Imported Comment. Original Details: Author: chrisr Created: 2014-04-10T15:54:23.000-0700

rbeckman-nextgen commented 4 years ago

So I was actually able to reproduce this locally. I took network captures on both sides while it was happening, and there's something odd going on though. When the issue occurs, I can see multiple SYN packets being pushed from the client side. The packet just keep retransmitting because no ACK is coming back from the server (while this is going on, nothing is coming back from the server). On the server side, while this was going on, absolutely nothing was being received (was monitoring via tcpdump).

Imported Comment. Original Details: Author: narupley Created: 2014-04-18T07:37:19.000-0700

rbeckman-nextgen commented 4 years ago

It's definitely the lack of the idle connection timeout thread that's causing this issue to happen in 3.0.1 (and 3.0.2). I'm connecting to a 3.0.1 server (from Windows to an appliance) two ways:

It may be that for MIRTH-3070, all we actually needed to do was to increase the idle timeout on the server side. Leaving connections open on the client side may not have been one of the root issues. I also did the same test, this time using 3.0.0 instead of 3.0.1:

Doing network captures at the time, I can see that in both of the above cases (with 3.0.0), the NBNS packets (basically digs) are being sent to the server. Then the server is responding over ICMP (type/code 3/3 "unreachable"), with the original UDP payload attached, so that we know it's corresponding to the original NetBIOS query. This sequence of events is happening regardless of whether that idle connection timeout thread is being used on the client or not.

So that lends support to the idea that the timeout thread didn't necessarily directly cause the original reverse DNS lookup issue (though it made it more pronounced because new connections had to be created more often). Back in the 3.0.1 testing, those reverse DNS lookups are not being done at all, regardless of whether the client is using the timeout thread.

Imported Comment. Original Details: Author: narupley Created: 2014-04-22T16:49:06.000-0700

rbeckman-nextgen commented 4 years ago

Interesting. Can you test trunk and add the idle connection thread back to the client, then lower the idle timeout of the server in trunk and see if the reverse dns issue directly corresponds to the server's idle timeout.

Imported Comment. Original Details: Author: wayneh Created: 2014-04-22T18:18:00.000-0700

rbeckman-nextgen commented 4 years ago

With 3.1.0 we've switched to using HttpComponent instead of HttpClient (MIRTH-3264). It's possible that this alone could have fixed the problem. Since we're still having trouble reproducing the problem, I'm pushing it from 3.1.0. If anyone runs into this issue on 3.1.0+, please let us know in this ticket.

Imported Comment. Original Details: Author: wayneh Created: 2014-09-11T18:29:44.000-0700

rbeckman-nextgen commented 4 years ago

Has anyone had this issue after upgrading to 3.1.0+?

Imported Comment. Original Details: Author: wayneh Created: 2014-12-01T11:25:17.000-0800

rbeckman-nextgen commented 4 years ago

Yes, I observed the same issue persisting in 3.1.0 and 3.1.1.

Imported Comment. Original Details: Author: chrisr Created: 2014-12-01T11:43:56.000-0800

rbeckman-nextgen commented 4 years ago

Revisions 8179/8180: Brought back the idle connection monitor thread, but this time the idle timeout is much longer (5 minutes versus 5 seconds). The server-side idle time hasn't changed, so this will only affect the Client object, not any other calls. The reverse DNS issue on Windows will always remain, but hopefully 5 minutes is a reasonable enough window to help bridge the divide between that and the Operation Timed Out issue that others run into.

Imported Comment. Original Details: Author: narupley Created: 2017-02-11T17:32:21.000-0800