mozilla / application-services

Firefox Application Services
https://mozilla.github.io/application-services/
Other
609 stars 225 forks source link

Fenix doesn't Sync with the stage server: Missing server timestamp header in request #3379

Closed vladikoff closed 3 years ago

vladikoff commented 4 years ago

With this configuration in fenix:

<img src=https://user-images.githubusercontent.com/128755/87977298-872fff00-ca9c-11ea-82bc-222b3b70599f.png width=350 />

ADB Logcat says:

 sync15::sync_multiple  D  Declined engines list after state machine set to: Some([])
                         W  sync failed: Missing server timestamp header in request, final status=OtherError
                         W  Backtrace: None
  sync_manager::manager  I  Sync finished with status OtherError
             SyncWorker  E  'Other' error :(

┆Issue is synchronized with this Jira Task ┆Epic: a-s consumer issues ┆Sprint: Backlog

rfk commented 4 years ago

If this was desktop I'd say "you've missing the /1.0/sync/1.5 suffix on your tokenserver URL" but I don't recall whether that's needed with the rust components.

vladikoff commented 4 years ago

If this was desktop I'd say "you've missing the /1.0/sync/1.5 suffix on your tokenserver URL" but I don't recall whether that's needed with the rust components.

I tried it with https://token.services.mozilla.com/ without the versioning business and it synced properly!

vladikoff commented 4 years ago

Omitting the custom sync server results in the same error:

 sync15::token  D  Fetching token from https://token.stage.mozaws.net/1.0/sync/1.5
     ResolverController  E  No valid NAT64 prefix (123, <unspecified>/0)
                SELinux  E  avc:  denied  { find } for interface=vendor.qti.hardware.servicetracker::IServicetracker sid=u:r:system_server:s0 pid=1298 scontext=u:r:system_server:s0 tcontext=u:object_r:default_android_hwservice:s0 
                            tclass=hwservice_manager permissive=0
                         E  avc:  denied  { find } for interface=vendor.qti.hardware.servicetracker::IServicetracker sid=u:r:system_server:s0 pid=1298 scontext=u:r:system_server:s0 tcontext=u:object_r:default_android_hwservice:s0 
                            tclass=hwservice_manager permissive=0
                  Icing  I  Indexing com.android.chrome-internal.3p:WebPage from com.android.chrome using seqno 93612
                SELinux  E  avc:  denied  { find } for interface=vendor.qti.hardware.servicetracker::IServicetracker sid=u:r:system_server:s0 pid=1298 scontext=u:r:system_server:s0 tcontext=u:object_r:default_android_hwservice:s0 
                            tclass=hwservice_manager permissive=0
     ResolverController  E  No valid NAT64 prefix (123, <unspecified>/0)
                SELinux  E  avc:  denied  { find } for interface=vendor.qti.hardware.servicetracker::IServicetracker sid=u:r:system_server:s0 pid=1298 scontext=u:r:system_server:s0 tcontext=u:object_r:default_android_hwservice:s0 
                            tclass=hwservice_manager permissive=0
                 chatty  I  uid=1000(system) hwservicemanage identical 1 line
                SELinux  E  avc:  denied  { find } for interface=vendor.qti.hardware.servicetracker::IServicetracker sid=u:r:system_server:s0 pid=1298 scontext=u:r:system_server:s0 tcontext=u:object_r:default_android_hwservice:s0 
                            tclass=hwservice_manager permissive=0
  sync15::sync_multiple  D  Declined engines list after state machine set to: Some([])
                         W  sync failed: Missing server timestamp header in request, final status=OtherError
                         W  Backtrace: None
  sync_manager::manager  I  Sync finished with status OtherError
             SyncWorker  E  'Other' error :(
rfk commented 4 years ago

Does it work if you use the prod FxA server with the stage tokenserver?

vladikoff commented 4 years ago

No, doesn't work. End up in the circuit breaker.

sync15::token  D  Fetching token from https://token.stage.mozaws.net/1
                            .0/sync/1.5
     ResolverController  E  No valid NAT64 prefix (123, <unspecified>/0)
          sync15::token  W  Non-success status when fetching token: 401
  sync15::sync_multiple  D  Declined engines list after state machine set to: So
                            me([])
                         W  sync failed: HTTP status 401 when requesting a token
                             from the tokenserver, final status=AuthenticationEr
                            ror
rfk commented 4 years ago

So this sync failed: Missing server timestamp header in request is very strange, and seems to indicate that you're getting a 200 OK from tokenserver that somehow doesn't include the X-Timestamp header. This should in theory be impossible (which is one of the reasons we error on it) because there's a "tween" in the tokenserver server code that sets this header on every response. So, I'm not entirely sure what's happening here.

mhammond commented 4 years ago

Note that it may not be the tokenserver - sync15's client.rs has the same check as token.rs. The default logs should be tweaked because they aren't giving enough info, but, I see these logs in this order:

07-21 15:19:03.098 27623 27674 I sync15::sync_multiple: Advancing state machine to ready (full)
07-21 15:19:03.098 27623 27674 D sync15::token: Fetching token from https://token.stage.mozaws.net/1.0/sync/1.5
07-21 15:19:04.002 27623 27674 D sync15::sync_multiple: Declined engines list after state machine set to: Some([])
07-21 15:19:04.003 27623 27674 W sync15::sync_multiple: sync failed: Missing server timestamp header in request, final status=OtherError

If it was the token server responding without that header, it's not clear why we'd still log about 'Declined engines' in between.

(I'm not saying that it's not the token server - just that it might not be :)

vasilicatamas commented 4 years ago

The sync continues to fail on stage using Fenix and iOS. I’ve tried the following values for Custom Sync server and the result is the same:

  1. Empty Custom Sync server field - logs: logcat - 1.txt

  2. https://token.stage.mozaws.net - logs: logcat - 2.txt

  3. https://token.services.mozilla.com/ - logs: logcat - 3.txt

This issue is a real blocker for sync stage testing from a QA perspective.

mhammond commented 4 years ago

sorry, I was looking at the oauth token fetch on one and sync token on the other. Both have the same headers, including the case.

rfk commented 4 years ago

Empty Custom Sync server field - logs: logcat - 1.txt https://token.stage.mozaws.net - logs: logcat - 2.txt

Both of these are showing the sync failed: Missing server timestamp header in request. I tried to reproduce this using my own account on stage, but wasn't able to yet.

https://token.services.mozilla.com/ - logs: logcat - 3.txt

This log shows Fenix triggering its "auth retry loop circuit-breaker" and surfacing an error to the user, which I believe is expected when using the production sync stack with a stage FxA stack.

rfk commented 4 years ago

I was able to reproduce this issue using the vlad2@restmail.net account; luckily I've seen enough of vlad's demos that I was able to guess the corresponding account password ;-)

With additional debugging logs:

2020-08-04 11:36:32.109 12267-12457/org.mozilla.fenix.debug D/sync15::token: TS HEADERS Headers { headers: [Header { name: HeaderName("content-length"), value: "752" }, Header { name: HeaderName("access-control-allow-methods"), value: "OPTIONS" }, Header { name: HeaderName("x-timestamp"), value: "1596504992" }, Header { name: HeaderName("access-control-allow-credentials"), value: "true" }, Header { name: HeaderName("access-control-max-age"), value: "1728000" }, Header { name: HeaderName("connection"), value: "keep-alive" }, Header { name: HeaderName("x-content-type-options"), value: "nosniff" }, Header { name: HeaderName("content-type"), value: "application/json" }, Header { name: HeaderName("date"), value: "04 Aug 2020 01:36:32 GMT" }, Header { name: HeaderName("access-control-allow-headers"), value: "X-Conditions-Accepted" }, Header { name: HeaderName("strict-transport-security"), value: "max-age=31536000" }] }
2020-08-04 11:36:32.110 12267-12457/org.mozilla.fenix.debug W/lla.fenix.debu: Accessing hidden method Lsun/misc/Unsafe;->putInt(Ljava/lang/Object;JI)V (greylist, linking, allowed)
2020-08-04 11:36:32.110 12267-12457/org.mozilla.fenix.debug W/lla.fenix.debu: Accessing hidden method Lsun/misc/Unsafe;->putObject(Ljava/lang/Object;JLjava/lang/Object;)V (greylist, linking, allowed)
2020-08-04 11:36:32.777 12267-12457/org.mozilla.fenix.debug W/lla.fenix.debu: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
2020-08-04 11:36:32.777 12267-12457/org.mozilla.fenix.debug W/lla.fenix.debu: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
2020-08-04 11:36:32.778 12267-12457/org.mozilla.fenix.debug D/sync15::client: SYNC HEADERS Headers { headers: [Header { name: HeaderName("x-firefox-spdy"), value: "h2" }, Header { name: HeaderName("access-control-allow-headers"), value: "X-Conditions-Accepted" }, Header { name: HeaderName("date"), value: "04 Aug 2020 01:36:33 GMT" }, Header { name: HeaderName("alt-svc"), value: "clear" }, Header { name: HeaderName("server"), value: "openresty/1.15.8.2" }, Header { name: HeaderName("via"), value: "1.1 google" }, Header { name: HeaderName("x-weave-timestamp"), value: "1596504993.16" }, Header { name: HeaderName("content-type"), value: "application/json" }, Header { name: HeaderName("access-control-allow-methods"), value: "OPTIONS" }, Header { name: HeaderName("access-control-max-age"), value: "1728000" }, Header { name: HeaderName("access-control-allow-credentials"), value: "true" }, Header { name: HeaderName("content-length"), value: "185" }] }
2020-08-04 11:36:32.778 12267-12457/org.mozilla.fenix.debug D/sync15::sync_multiple: Declined engines list after state machine set to: Some([])
2020-08-04 11:36:32.778 12267-12457/org.mozilla.fenix.debug W/sync15::sync_multiple: sync failed: Missing server timestamp header in request, final status=OtherError

So as @mhammond suggested, the error here actually appears to be in handling the response from the sync server. In this case it's looking for the x-last-modified header which is not present in the request.

I think this is a bug with the durable sync storage node not returning X-Last-Modified on some responses. It's not clear to me why we see this in stage but not production.

vladikoff commented 4 years ago

I was able to reproduce this issue using the vlad2@restmail.net account; luckily I've seen enough of vlad's demos that I was able to guess the corresponding account password ;-)

😄

rfk commented 4 years ago

I added some additional logging to find the failing request, and:

2020-08-04 11:54:01.932 15765-15943/org.mozilla.fenix.debug D/sync15::client: SYNC HEADERS for "/1.5/6384015/info/configuration" = Headers { headers: [Header { name: HeaderName("access-control-allow-headers"), value: "X-Conditions-Accepted" }, Header { name: HeaderName("alt-svc"), value: "clear" }, Header { name: HeaderName("x-weave-timestamp"), value: "1596506042.42" }, Header { name: HeaderName("x-firefox-spdy"), value: "h2" }, Header { name: HeaderName("access-control-allow-methods"), value: "OPTIONS" }, Header { name: HeaderName("server"), value: "openresty/1.15.8.2" }, Header { name: HeaderName("access-control-max-age"), value: "1728000" }, Header { name: HeaderName("content-length"), value: "185" }, Header { name: HeaderName("via"), value: "1.1 google" }, Header { name: HeaderName("date"), value: "04 Aug 2020 01:54:02 GMT" }, Header { name: HeaderName("access-control-allow-credentials"), value: "true" }, Header { name: HeaderName("content-type"), value: "application/json" }] }

I think it's totally reasonable for the /info/configuration request to not return an X-Last-Modified header, TBH. Let me see what other servers do in this case...

rfk commented 4 years ago

Here's the troublesome request, captured on desktop to get more info:

XHRGEThttps://stage.sync.nonprod.cloudops.mozgcp.net/1.5/6384015/info/configuration
[HTTP/2 200 OK 184ms]

GET
    https://stage.sync.nonprod.cloudops.mozgcp.net/1.5/6384015/info/configuration
Status200
OK
VersionHTTP/2
Transferred685 B (185 B size)

    access-control-allow-credentials
        true
    access-control-allow-headers
        DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,X-Conditions-Accepted
    access-control-allow-methods
        GET, POST, OPTIONS
    access-control-max-age
        1728000
    alt-svc
        clear
    content-length
        185
    content-type
        application/json
    date
        Tue, 04 Aug 2020 02:01:31 GMT
    server
        openresty/1.15.8.2
    via
        1.1 google
    X-Firefox-Spdy
        h2
    x-weave-timestamp
        1596506491.81

For comparison, here's the request I see on my production sync account, which does include the X-Last-Modified header:

XHRGEThttps://sync-1-us-west1-g.sync.services.mozilla.com/1.5/137697556/info/configuration
[HTTP/2 200 OK 201ms]

GET
    https://sync-1-us-west1-g.sync.services.mozilla.com/1.5/137697556/info/configuration
Status200
OK
VersionHTTP/2
Transferred697 B (165 B size)

    access-control-allow-credentials
        true
    access-control-allow-headers
        DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,X-Conditions-Accepted
    access-control-allow-methods
        GET, POST, OPTIONS
    access-control-max-age
        1728000
    alt-svc
        clear
    content-length
        165
    content-type
        application/json
    date
        Tue, 04 Aug 2020 01:58:31 GMT
    server
        openresty/1.15.8.2
    via
        1.1 google
    X-Firefox-Spdy
        h2
    x-last-modified
        1596506183.96
    x-weave-timestamp
        1596506311.77

I also tried using a new account in production, thinking this might cause an error of Fenix is the first client to sync to a new store. But in that case the production service returns X-Last-Modified: 0.00.

rfk commented 4 years ago

Stage sync storage claims to be running syncstorage-rs version 0.5.3, while prod claims to be running version 0.4.2. I can only assume this difference in behaviour is a relatively new change in the durable sync server code.

rfk commented 4 years ago

I filed https://github.com/mozilla-services/syncstorage-rs/issues/759 to fix this on the sync server.

I think we should also consider changing the client code here to avoid erroring out if the /info/configuration response is missing X-Last-Modified, because that resource doesn't really have a sensible "last modified" timestamp the way that user-uploaded data does. But given that we've got shipping clients that have the current behaviour, a server-side fix seems worthwhile.

isabelrios commented 4 years ago

Hi @rfk! I see that bug #759 closed, but still the issue on Android/iOS happening, should it be fixed right away or we need a a-s update? just wondering about the timing to have this fixed, thanks!

rfk commented 4 years ago

I think we need to fix to be deployed to https://stage.sync.nonprod.cloudops.mozgcp.net/, I'll ask over in the bug.

rfk commented 4 years ago

(This should not require any new client-side release in order to be fixed, we just need to get the right bits on the server)

pjenvey commented 4 years ago

The fix for mozilla-services/syncstorage-rs#759 is now deployed on both stage and production.