googleapis / python-firestore

Apache License 2.0
215 stars 74 forks source link

Collection on_snapshot incorrectly reports everything removed on internet reconnect #367

Closed elongberg closed 2 years ago

elongberg commented 3 years ago

Environment details

Steps to reproduce

  1. Setup a collection snapshot listener while connected to internet
  2. Disconnect from internet, wait a few seconds
  3. On a different device, remove a document from the collection
  4. Reconnect to internet, wait a few seconds
  5. Snapshot listener gets called indicating that every item has been removed, not just whatever has actually been removed

Code example

from google.cloud import firestore

def on_snapshot(snapshot, changes, read_time):
    print(f"Snapshot: [{', '.join(s.id for s in snapshot)}]")
    for change in changes:
        print(f"    {change.type.name}: {change.document.id}")

db = firestore.Client()
collection = db.collection("testCollection")
collection.on_snapshot(on_snapshot)
while True:
    pass

Other Info

I thought I had seen the issue occur even when skipping step 3 above, but have been unable to replicate that scenario.

My situation requires a degree of "offline support", which is when this came up. Individual document listeners don't seem to have any issues, but I'm expecting hundreds or thousands of documents and setting up a listener on each one seems overkill (and well beyond the recommended 100 max listeners outlined at https://firebase.google.com/docs/firestore/best-practices#realtime_updates).

tseaver commented 2 years ago

Working here on reproducing.

tseaver commented 2 years ago

Hmm, I was only able to reproduce the "show all deleted" case once, over multiple tries. Once I saw it, I tried configuring logging at DEBUG level, but couldn't reproduce, even after leaving the laptop in airplane mode long enough to trigger a whole bunch of logs for the client trying to restart the stream.

tseaver commented 2 years ago

@elongberg If you can still reproduce this behavior reliably, can you please configure logging as follows and post the output around where the bogus "delete all" occurs?

import logging

logging.basicConfig(level=logging.DEBUG)
elongberg commented 2 years ago

I am still able to consistently reproduce it. Also just tried it with the latest code for good measure (version 2.3.4).

First, I need to disable the network and wait until these logs about attempting to reconnect start spitting out endlessly (usually only takes a few seconds). If I reconnect before this happens, then the issue doesn't occur.

DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001E157A51490>> caused <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "Stream removed"
    debug_error_string = "{"created":"@1637272608.714000000","description":"Error received from peer ipv4:142.250.189.202:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Stream removed","grpc_status":2}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001E157A51490>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001E157A51490>> caused <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1637272608.778000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5420,"referenced_errors":[{"created":"@1637272608.778000000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001E157A51490>>.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:Stream was already re-established.

For comparison in a moment, this is what gets logged when I reconnect without doing a delete while offline.

DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001DDBE91D250>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.ADD
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: ADD
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:walk over add_changes
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: filter update
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.CURRENT
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: CURRENT
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:walk over add_changes
DEBUG:google.api_core.bidi:waiting for recv.

If I instead delete a document while offline, the following gets logged. (I don't have convenient access to another device for doing the delete right now, so I did that part on my phone through the firebase web console.)

DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x000001DDBE91D250>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.ADD
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: ADD
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:walk over add_changes
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: filter update
DEBUG:google.cloud.firestore_v1.watch:resetting documents
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.CURRENT
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: CURRENT
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:walk over add_changes
Snapshot: []
    REMOVED: IUYbPlKICWLUpKBIx98U
    REMOVED: XAPVTRfghjD5ZUkN1Gsc
    REMOVED: cPOqs0RI7vn8J92vQKrP
    REMOVED: dtVirrJCRemCF4YP8Pk5
    REMOVED: g493gvL1elcmzPVoySeg
DEBUG:google.api_core.bidi:waiting for recv.

(The ending is just indicating the few documents in my collection all being supposedly removed, when only one actually was.) Doing a diff between these shows just one line that seems potentially important:

DEBUG:google.cloud.firestore_v1.watch:resetting documents
tseaver commented 2 years ago

@elongberg Thanks for supplying the logging info. I believe that the "flush everything" you are seeing is due to these lines, where the watch checks that the filter.count matches the watch's own current net document count. Because of the missed remove, the two values are not in sync, and so the watch flushes any pending changes and marks all known documents as removed. The expectation is that they will be repopulated in a subsequent push from the server.