schwabe / ics-openvpn

OpenVPN for Android
3.36k stars 1.2k forks source link

App reconnects every 2 minutes after update from 0.7.36 to 0.7.37 #1486

Closed andronov-alexey closed 2 years ago

andronov-alexey commented 2 years ago

General information

  1. Android Version: 12
  2. Version of the app (version number/play store version/self-built): GooglePlay 0.7.37
  3. Server: SoftEther

Hello,

After latest app update to version 0.7.37 app started reconnecting to server every 2 minutes. The issue goes away when I install previous version 0.7.36

Here's the client logs:

[Server] Inactivity timeout (--ping-restart), restarting SIGUSR1[soft,ping-restart] received, process restarting

So it seems like new app version (0.7.37) sets some kind of reconnect timeout, for example:

ping-restart 0
keepalive 10 120

Regards

schwabe commented 2 years ago
andronov-alexey commented 2 years ago

Thanks for the answer

Is that on an existing profile or a freshly imported profile?

It was an existing profile

are those reconnect timeouts thing you see in the generated config or that you speculate?

It's my guess based on timestamps from UI log

While I was saving a log I found out it was more descriptive than the one shown in app UI There (see client_log.txt) I found the following entry:

PUSH: Received control message: 'PUSH_REPLY,ping 3,ping-restart 10, ...'

So turns out server pushes timeout setting and server is to blame To solve the issue I added the following setting to vpn config on the client: pull-filter ignore "ping-restart"

But the issue still there although now intervals became ~3 minutes (see client_log ignore ping-restart.txt):

2022-05-16 23:40:17 [Finland] Inactivity timeout (--ping-restart), restarting 2022-05-16 23:42:52 [Finland] Inactivity timeout (--ping-restart), restarting 2022-05-16 23:46:58 [Finland] Inactivity timeout (--ping-restart), restarting 2022-05-16 23:50:32 [Finland] Inactivity timeout (--ping-restart), restarting 2022-05-16 23:53:03 [Finland] Inactivity timeout (--ping-restart), restarting

And additionally there two run-time exceptions: 1.

java.lang.NoSuchMethodException: com.android.org.bouncycastle.asn1.x509.X509Name.getInstance [class java.lang.Object] at java.lang.Class.getMethod(Class.java:2103) at java.lang.Class.getMethod(Class.java:1724) at de.blinkt.openvpn.core.X509Utils.getCertificateFriendlyName(X509Utils.java:136) at de.blinkt.openvpn.core.X509Utils.getCertificateFriendlyName(X509Utils.java:82) at de.blinkt.openvpn.views.FileSelectLayout.setData(FileSelectLayout.java:155) at de.blinkt.openvpn.fragments.Settings_Basic.loadPreferences(Settings_Basic.java:192) at de.blinkt.openvpn.fragments.KeyChainSettingsFragment.onStart(KeyChainSettingsFragment.kt:208) at androidx.fragment.app.Fragment.performStart(Fragment.java:3053) at androidx.fragment.app.FragmentStateManager.start(FragmentStateManager.java:587) at androidx.fragment.app.FragmentStateManager.moveToExpectedState(FragmentStateManager.java:279) at androidx.fragment.app.FragmentStore.moveToExpectedState(FragmentStore.java:113) at androidx.fragment.app.FragmentManager.moveToState(FragmentManager.java:1374) at androidx.fragment.app.FragmentManager.dispatchStateChange(FragmentManager.java:2841) at androidx.fragment.app.FragmentManager.dispatchStart(FragmentManager.java:2791) at androidx.fragment.app.FragmentController.dispatchStart(FragmentController.java:273) at androidx.fragment.app.FragmentActivity.onStart(FragmentActivity.java:486) at androidx.appcompat.app.AppCompatActivity.onStart(AppCompatActivity.java:246) at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1467) at android.app.Activity.performStart(Activity.java:8079) at android.app.ActivityThread.handleStartActivity(ActivityThread.java:3732) at android.app.servertransaction.TransactionExecutor.performLifecycleSequence(TransactionExecutor.java:221) at android.app.servertransaction.TransactionExecutor.cycleToPath(TransactionExecutor.java:201) at android.app.servertransaction.TransactionExecutor.executeLifecycleState(TransactionExecutor.java:173) at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:97) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2253) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loopOnce(Looper.java:201) at android.os.Looper.loop(Looper.java:288) at android.app.ActivityThread.main(ActivityThread.java:7870) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)

2)

java.lang.NoSuchMethodException: com.android.org.bouncycastle.asn1.x509.X509Name.getInstance [class java.lang.Object] at java.lang.Class.getMethod(Class.java:2103) at java.lang.Class.getMethod(Class.java:1724) at de.blinkt.openvpn.core.X509Utils.getCertificateFriendlyName(X509Utils.java:136) at de.blinkt.openvpn.core.X509Utils.getCertificateFriendlyName(X509Utils.java:82) at de.blinkt.openvpn.views.FileSelectLayout.setData(FileSelectLayout.java:155) at de.blinkt.openvpn.fragments.Settings_Basic.loadPreferences(Settings_Basic.java:192) at de.blinkt.openvpn.fragments.KeyChainSettingsFragment.onStart(KeyChainSettingsFragment.kt:208) at androidx.fragment.app.Fragment.performStart(Fragment.java:3053) at androidx.fragment.app.FragmentStateManager.start(FragmentStateManager.java:587) at androidx.fragment.app.FragmentStateManager.moveToExpectedState(FragmentStateManager.java:279) at androidx.fragment.app.FragmentManager.executeOpsTogether(FragmentManager.java:1840) at androidx.fragment.app.FragmentManager.removeRedundantOperationsAndExecute(FragmentManager.java:1758) at androidx.fragment.app.FragmentManager.execSingleAction(FragmentManager.java:1670) at androidx.fragment.app.BackStackRecord.commitNowAllowingStateLoss(BackStackRecord.java:323) at androidx.fragment.app.FragmentPagerAdapter.finishUpdate(FragmentPagerAdapter.java:249) at androidx.viewpager.widget.ViewPager.populate(ViewPager.java:1244) at androidx.viewpager.widget.ViewPager.populate(ViewPager.java:1092) at androidx.viewpager.widget.ViewPager.onMeasure(ViewPager.java:1622) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at android.widget.LinearLayout.measureChildBeforeLayout(LinearLayout.java:1552) at android.widget.LinearLayout.measureVertical(LinearLayout.java:842) at android.widget.LinearLayout.onMeasure(LinearLayout.java:721) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at android.widget.FrameLayout.onMeasure(FrameLayout.java:194) at androidx.appcompat.widget.ContentFrameLayout.onMeasure(ContentFrameLayout.java:145) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at androidx.appcompat.widget.ActionBarOverlayLayout.onMeasure(ActionBarOverlayLayout.java:496) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at android.widget.FrameLayout.onMeasure(FrameLayout.java:194) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at android.widget.LinearLayout.measureChildBeforeLayout(LinearLayout.java:1552) at android.widget.LinearLayout.measureVertical(LinearLayout.java:842) at android.widget.LinearLayout.onMeasure(LinearLayout.java:721) at android.view.View.measure(View.java:25833) at android.view.ViewGroup.measureChildWithMargins(ViewGroup.java:6980) at android.widget.FrameLayout.onMeasure(FrameLayout.java:194) at com.android.internal.policy.DecorView.onMeasure(DecorView.java:758) at android.view.View.measure(View.java:25833) at android.view.ViewRootImpl.performMeasure(ViewRootImpl.java:3676) at android.view.ViewRootImpl.measureHierarchy(ViewRootImpl.java:2460) at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2739) at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:2179) at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:8787) at android.view.Choreographer$CallbackRecord.run(Choreographer.java:1037) at android.view.Choreographer.doCallbacks(Choreographer.java:845) at android.view.Choreographer.doFrame(Choreographer.java:780) at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:1022) at android.os.Handler.handleCallback(Handler.java:938) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loopOnce(Looper.java:201) at android.os.Looper.loop(Looper.java:288) at android.app.ActivityThread.main(ActivityThread.java:7870) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)

So I'm wondering: 1) Why old 0.7.36 version worked with "ping-restart 10" option from server? 2) Why new 0.7.37 still reconnects with option pull-filter ignore "ping-restart"? 3) Should I create new issues for run-time exceptions?

schwabe commented 2 years ago

The runtime expections are red herrings. Ignore them for this ticket.

The default is to still restart a connection, so ignoring ping-restart just gives you the default which is still USR1. The ping restart of 3 10 is very agressive. That means that if there is any 10s hickup, the client will disconnect. And yes for UDP there is a 120s default. You can going back to 0.7.36 but this looks more like some other problem like bad server connection or mismatched control parameter being negotiated so pings are not getting through. So a complete log to look for those problems would be good.

andronov-alexey commented 2 years ago

Okay, I'll keep using the previous version then Thanks

schwabe commented 2 years ago

@andronov-alexey well, if you using the old version, this bug will probably not be fixed since I have no idea what is happening in your setup.

andronov-alexey commented 2 years ago

I looked at server logs and found out that SoftEther server breaks the connection after 20 seconds and then the client drops it too after another 10 seconds. In the following example connection established at 06:23:19, server breaks it at 06:23:42 and client breaks it at 10:23:51:

Server:

06:23:18.415 Connection "CID-23390": Successfully authenticated as user "user".
06:23:18.415 Connection "CID-23390": The new session "SID-USER-[OPENVPN_L3]-2914" has been created. (IP address: 111.111.111.111, Port number: 15074, Physical underlying protocol: "Legacy VPN - OPENVPN_L3")
06:23:18.415 Session "SID-USER-[OPENVPN_L3]-2914": The parameter has been set. Max number of TCP connections: 1, Use of encryption: Yes, Use of compression: No, Use of Half duplex communication: No, Timeout: 20 seconds.
06:23:19.452 OpenVPN Session 4285 (111.111.111.111:15074 -> 111.111.111.112:222) Channel 0: The full strings replied: "PUSH_REPLY,ping 3,ping-restart 10,ifconfig 192.168.30.29 192.168.30.30,dhcp-option DOMAIN server.net.domain,dhcp-option DNS 94.140.14.14,dhcp-option DNS 94.140.15.15,route-gateway 192.168.30.30,redirect-gateway def1"
06:23:42.514 Session "SID-USER-[OPENVPN_L3]-2914": The session has been terminated. The statistical information is as follows: Total outgoing data size: 4775 bytes, Total incoming data size: 5779 bytes.
06:23:42.544 Connection "CID-23390" terminated by the cause "The VPN session has been deleted. It is possible that either the administrator disconnected the session or the connection from the client to the VPN Server has been disconnected." (code 11).
06:23:42.544 Connection "CID-23390" has been terminated.

Client:

10:23:19 MANAGEMENT: >STATE:1652854999,CONNECTED,SUCCESS,192.168.30.29,111.111.111.112,222,,
10:23:51 [Finland] Inactivity timeout (--ping-restart), restarting

So I guess the server cant ping the client and drops the connection after timeout. I tried to disable firewall on server to no avail.

Also "OpenVPN Connect" is working fine with the same config file

schwabe commented 2 years ago

Can you post a whole log? There is something fishy going on but I have no idea what it is. Does it reproduce if you use OpenVPN3 core under general settings?

andronov-alexey commented 2 years ago

I provided two full logs at the earlier reply They are all pretty similar (look for "Inactivity timeout (--ping-restart), restarting" lines)

With OpenVPN3 enabled I can't connect at all with the error: Client exception in transport_recv: crypto_alg: AES-128-CBC: bad cipher for data channel use

andronov-alexey commented 2 years ago

I can also send you my ".ovpn" config file if that would help

yan12125 commented 2 years ago

I believe I got the same issue. I tried building the APK, and git bisect identified https://github.com/schwabe/openvpn/commit/53560170b95ec99dcd9f27031515f11a23370e3f as the first bad commit. With https://github.com/schwabe/openvpn/commit/6013396d09b8f1ff5d98d90200bf67ecb299bb81, I don't get any restart after several minutes.

Here are some general information:

General information

  1. Android version: 9
  2. Android Vendor/Custom ROM: HTC stock ROM + Magisk
  3. Device: HTC U11
  4. (version number/play store version/self-built): F-Droid 0.7.37. The issue also appears with self-built APK

Description of the issue

TCP/UDP connections are restarted after roughly 10s/30s, respectively

Logs (from self-built APK)

Configuration file

ics-openvpn-config.txt (exported from the app; key & cert removed)

Additional information

schwabe commented 2 years ago

If that commit in OpenVPN is really the offending one, Softether does not implement the OpenVPN protocol correctly. Having repeated ACKs should not trigger a complete meltdown in the protocol

schwabe commented 2 years ago

Can you capture a tcpdump of the connection as pcap?

yan12125 commented 2 years ago

Can you capture a tcpdump of the connection as pcap?

Sure!

ics-openvpn-tcp-filtered.pcap.gz ics-openvpn-udp-filtered.pcap.gz

Both are captured on the Wi-Fi router using WireShark (I'm more familiar with it than tcpdump) and filtered to keep traffic to/from the VPN server only (ip.src == 140.109.17.11 || ip.dst == 140.109.17.11). Please tell me if you need complete traffic.

schwabe commented 2 years ago

Yeah, I found the problem. Basically Softether never implemented the protocol correctly:

In its code (Proto_OpenVPN.c) there is a code fragement:

    if (ret->NumAck > 4)
    {
        goto LABEL_ERROR;
    }

while OpenVPN always had up to 8 acks per P_ACK_V1 packet.

So basically Softether ignores valid OpenVPN ACK packets and then drops the connection after 2 minutes.

JackieKu commented 2 years ago

Tried https://github.com/SoftEtherVPN/SoftEtherVPN/pull/1610, works for me.

yan12125 commented 2 years ago

Thank you very much for the fix! It may take some time for a new release of SoftEther VPN and for network administrators at our department to upgrade, so I need a workaround for now. Besides using an older commit, is limiting the number of ACKs a valid workaround?

diff --git a/src/openvpn/reliable.h b/src/openvpn/reliable.h
index 0f3caa34..5455d42e 100644
--- a/src/openvpn/reliable.h
+++ b/src/openvpn/reliable.h
@@ -41,7 +41,7 @@
  *  @{ */

-#define RELIABLE_ACK_SIZE 8     /**< The maximum number of packet IDs
+#define RELIABLE_ACK_SIZE 4     /**< The maximum number of packet IDs
                                  *   waiting to be acknowledged which can
                                  *   be stored in one \c reliable_ack
                                  *   structure. */
schwabe commented 2 years ago

No. That is absolutley not a good workaround. That would make OpenVPN misbehave like Softether and break with other OpenVPN servers.

yan12125 commented 2 years ago

Thank you very much for clarafication! I will switch to 0.7.36 instead.

schwabe commented 2 years ago

@yan12125 try this:

diff --git a/src/openvpn/ssl_pkt.c b/src/openvpn/ssl_pkt.c
index e86fbc1b..d6e5c323 100644
--- a/src/openvpn/ssl_pkt.c
+++ b/src/openvpn/ssl_pkt.c
@@ -177,6 +177,15 @@ write_control_auth(struct tls_session *session,
 {
     uint8_t header = ks->key_id | (opcode << P_OPCODE_SHIFT);

+    /* Workaround for Softether servers. Softether has a bug that it only
+     * allows 4 ACks in packets and drops packets if more ACKs are contained
+     * in a packet (see commit a14d812dcbc in Softether) */
+    if (session->tls_wrap.mode == TLS_WRAP_NONE && !session->opt->server
+            && !(session->opt->crypto_flags & CO_USE_TLS_KEY_MATERIAL_EXPORT))
+    {
+            max_ack = min_int(max_ack, 4);
+    }
+
     ASSERT(link_socket_actual_defined(&ks->remote_addr));
     ASSERT(reliable_ack_write
                (ks->rec_ack, ks->lru_acks, buf, &ks->session_id_remote,
yan12125 commented 2 years ago

This works for both TCP and UDP connections, thanks!

schwabe commented 2 years ago

Closed by commit 02eabb0e9a