passbolt / mobile-passbolt-android

Android mobile app for Passbolt, the open source password manager for teams.
https://play.google.com/store/apps/details?id=com.passbolt.mobile.android&hl=en&gl=US
GNU Affero General Public License v3.0
42 stars 11 forks source link

Login impossible when 30s delay because server announces AAAA record without really supporting IPv6 #8

Closed doegox closed 1 year ago

doegox commented 1 year ago

Login impossible on "slow" networks

What you did / What happened

The issue is reproducible, whenever I tried using the mobile app when abroad it failed, or when the first step finally succeeded (then getting the TOTP prompt) it fails on OTP verification because again too slow.

What you expected to happen

Login with OTP should work even on "slow" networks

The issue might be in the mobile app or in the server, I can't tell...

marcin-michalek-miquido commented 1 year ago

Hello @doegox,

thank you for the feedback.

Firstly I'd like to say that currently, the sign-in allows up to 10 seconds time difference between the mobile app and the backend - if the time difference delta (so either forward or behind) is in that range the mobile app compensates that difference locally by adjusting time in the crypto operations.

Secondly, I'd like to add that 4G is not so slow - to test your scenario I configured the proxy with throttling to 4G, 3G, and even much lower speeds and still the sign-in took about 3-5 seconds. (That it also how we set the limit mentioned in the first point). So in my opinion long time of the response (you mentioned up to 30s) is caused not only by the speed but also the quality (i.e. packet loss) of the connection to your backend.

Then going to the core issue - in my opinion if you can see "Server and client time is out of sync, please contact your administrator." snackbar it should happen only if the mobile and backend clocks are out of sync by more then 10 seconds. This is validated even before the sign-in process. It may be possible that when you are abroad as mentioned the phone takes time from a different time provider which has a slightly bigger time delta with your server. Next time when experiencing the issue could you please provide us with logs so we can analyze and debug the timing more closely? (How to enable and share logs?)

Cheers

doegox commented 1 year ago

Sure, I can reproduce immediately from home, I just need to deactivate Wi-Fi.

Over Wi-Fi:

20:28:38 Checking biometry state
20:28:38 Biometry ready
20:28:38 --> GET https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg http/1.1
20:28:38 <-- 200 OK https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg (81ms, unknown-length body)
20:28:38 --> GET https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg http/1.1
20:28:39 <-- 200 OK https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg (120ms, unknown-length body)
20:28:40 Passphrase cache cleared
20:28:40 Passphrase cached
20:28:40 Getting server pgp and rsa keys
20:28:41 --> GET https://censored-passbolt-server/auth/verify.json http/1.1
20:28:41 <-- 200 OK https://censored-passbolt-server/auth/verify.json (80ms, unknown-length body)
20:28:41 --> GET https://censored-passbolt-server/auth/jwt/rsa.json http/1.1
20:28:41 <-- 200 OK https://censored-passbolt-server/auth/jwt/rsa.json (88ms, unknown-length body)
20:28:41 Getting server pgp and rsa keys succeeded
20:28:41 Checking if time adjustment is needed
20:28:41 Local time sync needed. Adjusted: 0

Over 4G:

20:30:04 Checking biometry state
20:30:04 Biometry ready
20:30:06 Passphrase cache cleared
20:30:06 Passphrase cached
20:30:06 Getting server pgp and rsa keys
20:30:34 --> GET https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg http/1.1
20:30:34 <-- 200 OK https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg (101ms, unknown-length body)
20:30:34 --> GET https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg http/1.1
20:30:34 <-- 200 OK https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg (92ms, unknown-length body)
20:30:36 --> GET https://censored-passbolt-server/auth/verify.json http/1.1
20:30:37 <-- 200 OK https://censored-passbolt-server/auth/verify.json (109ms, unknown-length body)
20:30:37 --> GET https://censored-passbolt-server/auth/jwt/rsa.json http/1.1
20:30:37 <-- 200 OK https://censored-passbolt-server/auth/jwt/rsa.json (70ms, unknown-length body)
20:30:37 Getting server pgp and rsa keys succeeded
20:30:37 Checking if time adjustment is needed
20:30:37 Time delta to big for sync. Showing error.

So I see there is a 30s delay not explained in the logs between these 2 messages when on 4G:

20:30:06 Getting server pgp and rsa keys
20:30:34 --> GET https://censored-passbolt-server/avatars/view/4b2a3025-d463-494e-8c05-5b6ddd7bdb00/medium.jpg http/1.1
doegox commented 1 year ago

I can send more complete and uncensored logs if you provide me an email, but I'd rather not post them online :)

doegox commented 1 year ago

Because in the previous log it seems it was the avatar fecth that takes 30s to start, and because I suspected maybe some delay in a DNS request, I tried the following:

marcin-michalek-miquido commented 1 year ago

Thank you for the detailed logs and comments. The key lines here, for me, are these ones:

20:54:09 Getting server pgp and rsa keys
20:54:39 --> GET https://censored-passbolt-server/auth/verify.json http/1.1

If you take a look here we can see that immediately after Getting server pgp and rsa keys log in the next line the app fetches the keys from the verify.json endpoint. This means that the next GET... line should start roughly in the same second but in your case, it starts about 30 seconds later.

Moreover, once the first request is sent the response time and next ones are very fast:

20:54:39 --> GET https://censored-passbolt-server/auth/verify.json http/1.1
20:54:39 <-- 200 OK https://censored-passbolt-server/auth/verify.json (93ms, unknown-length body)
20:54:39 --> GET https://censored-passbolt-server/auth/jwt/rsa.json http/1.1
20:54:39 <-- 200 OK https://censored-passbolt-server/auth/jwt/rsa.json (83ms, unknown-length body)

both sent and received in the same second.

So it seems like only the first call is taking a long time to initiate (and initiation time is part of the request so it's also measured in the time delta adjustment mechanism).

From what I found this could be happening due to some DNS config. I will additionally consult with the support team as this may potentially be a configuration issue.

doegox commented 1 year ago

Thank you!! It is indeed a problem linked to AAAA record in DNS (IPv6). I'm not using IPv6 on the server but my domain was a CNAME to the OVH VPS attributed name, and they advertise an IPv6 on it (strange as none of my older OVH VPS have a AAAA and there is no way to disable IPv6 on OVH console, but ok whatever.)

Now I replaced the CNAME record with a A record so once it is propagated it should fix the issue as no AAAA will be announced. That explains why I got the 30s timeout before falling back on IPv4 address of the server when using networks supporting ipv6 (like my 4G connection) and no problem on other networks (such as my wifi).

It is already working fine on my phone when I enable Warp (1.1.1.1 DNS) or Google One VPN, as these were faster to catch the DNS change than my telecom provider.

doegox commented 1 year ago

I changed the title so if it happens to someone else, he'll get a chance to find the info

doegox commented 1 year ago

BTW this means there was a first GET request sent in IPv6 that does not appear in the passbolt logs, it would have been easier to debug if it was present. And is time was really measured across the working GET and its response we wouldn't have the misleading out-of-sync error.

marcin-michalek-miquido commented 1 year ago

I am glad that all is working fine now :) for the networking logs, we actually use an existing solution from the library - I believe that there is a GET ... log only after DNS is resolved (that is why the "GET ..." log appeared 30 seconds after "Getting server pgp and rsa keys". Additionally, I found out that there is a possibility to log more detailed events including dnsStart and dnsStop using this. Could be of help at least for the sign-in request if more people rise this scenario.

doegox commented 1 year ago

GET ... log only after DNS is resolved

Technically it is not the DNS that takes time but sth like:

But the first GET attempt does not appear in the logs.