Closed NixBiks closed 3 years ago
Just noticed this which seems like the same issue: https://github.com/firebase/firebase-admin-python/issues/294. It concerns me that is has been open since May last year - isn't this considered a core feature?
Btw. I always start two processes running in the background. Both listen to the same news but handle it differently. In the end both processes stopped receiving news at the same time.
@crwilcox or @tseaver Any ideas here about what could be going on as far as the python code is concerned? I've not yet heard about this happening in other languages.
@mr-bjerre About how frequent are the updates before you stop receiving them? Also, could you try enabling logging in your script and report the output of the grpc
-related entries near the time when you cease getting updates? E.g.:
import logging
logging.basicConfig(filename='python-firestore-18.log', level=logging.DEBUG)
It seems to be very random when I stop receiving them. It has happened after 20 minutes some times and also after 7 hours some times. I've created a log like you suggested but I don't see any grpc related entries?
Oh wait I did find something in my logs where it broke (FYI I have four processes running - not entirely sure if each process writes to the log :-S)
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:plx_python.util.old_firestore:Parsing news changes at 2020-03-04 09:33:56+00:00
DEBUG:plx_python.util.old_firestore:ADDED: 707ab56fa5f2f6f4
INFO:plx_python.util.thirdparty:Received 707ab56fa5f2f6f4
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f7185ff6940>> caused <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Received RST_STREAM with error code 0"
debug_error_string = "{"created":"@1583314580.826689957","description":"Error received from peer ipv4:108.177.15.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f7185ff6940>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/token HTTP/1.1" 200 208
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:recved response.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7fa6f4ad1340>> caused <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Received RST_STREAM with error code 0"
debug_error_string = "{"created":"@1583314582.251496277","description":"Error received from peer ipv4:74.125.133.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7fa6f4ad1340>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>> caused <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Received RST_STREAM with error code 0"
debug_error_string = "{"created":"@1583314582.657813506","description":"Error received from peer ipv4:74.125.140.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>> caused <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Transport closed"
debug_error_string = "{"created":"@1583314582.799235377","description":"Error received from peer ipv4:74.125.140.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Transport closed","grpc_status":14}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
and then the log just continues likes that
I am experiencing the same issue! Wasn't able to solve it yet.
@crwilcox or @tseaver Any ideas here about what could be going on as far as the python code is concerned? I've not yet heard about this happening in other languages.
I was directed to this thread by a google employee through my own Firestore IssueTracker question #152867838. Re: I've not yet heard about this happening in other languages
- I am experiencing the same issue with Node.js firebase-admin SDK. My snapshot listener stops listening silently, without throwing any errors after anywhere from 2 - 24 hours.
Same issue here, still looking for a solution
@MaticConradi @r-hoeve @juancruzmartino could you share your library versions, python version, and where the code experiencing the failure is executing?
Python version: 3.7.5
google-cloud-firestore: 1.7.0
google-api-core[grpc]: 1.17.0
Snapshots are created with:
app = App()
app.init_listeners()
def init_listeners(self):
database.collection("accounts").on_snapshot(self.update_account_properties)
# and 5 other snapshots
def update_account_properties(self, settings, changes, timestamp):
for change in changes:
# processing changes
All code executes just fine for a while, but at some point the update_account_properties
doesn't get called anymore, and no exceptions are thrown.
I'm experiencing the same problem with Python 3.8.3 and google-cloud-firestore 1.6.1. The callback method specified as the parameter of on_snapshot() stops being called, without exception thrown or error message, after arbitrary amount of time, usually a few hours.
I'm working this week on a reproducer app for this bug in Python: it will run on "Cloud Run", with a listener process and several different "mutator" process, making writes at different intervals.
@mr-bjerre Working on a reproducer script, I've come across a couple of questions:
RST_STREAM
frames with error code 0x0
seem to be handled "correctly" in the log you posted (the stream is recreated). Do you see any correlation of RST_STREAM
log entries with having your listener stop receiving updates?I haven’t noticed any correlation no with anything really. But I don’t think that I qualify to really answer that - I’m not exactly sure what to look for. Maybe some of the other users here can comment on that.
Regarding the architecture - you are asking how many processes write to the collection that I listen to? If that’s the case then the answer is a few
Testing with code in PR #158: watching a single document while updating it every 10 seconds from another single process has shown no errors after ~18 hours. E.g.:
$ venv/bin/python listen_document.py
INFO:listen_document:Watching: repro_gcf_18/watch_me
INFO:listen_document:Waiting...
INFO:listen_document:Notified: 2020-08-20T03:15:34+00:00
INFO:listen_document:Notified: 2020-08-20T03:15:37+00:00
INFO:listen_document:Change: ChangeType.ADDED
INFO:listen_document:Notified: 2020-08-20T03:15:46+00:00
INFO:listen_document:Change: ChangeType.MODIFIED [ 1]
...
INFO:listen_document:Notified: 2020-08-20T20:13:19+00:00
INFO:listen_document:Change: ChangeType.MODIFIED [ 610]
$ venv/bin/python update_document.py --count=864 --interval=100
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/864]
INFO:update_document:Sleeping: 99.849
...
INFO:update_document:Updating: repro_gcf_18/watch_me [ 610/864]
INFO:update_document:Sleeping: 100.172
I’ve had a workaround for a long time now but I can test my setup next week. Can you maybe share your simple scripts?
I have had the same problem for a few months and I have not been able to solve it, my snapshot stops working randomly, sometimes it works for two days, some others after 20 minutes it stops working. I hope google engineers have a solution.
Next repro attempt: I just tried with the listen_collection.py
script from PR #158. E.g.:
$ venv/bin/python listen_document.py
$ venv/bin/python listen_document.py
INFO:listen_document:Watching: repro_gcf_18/watch_me
INFO:listen_document:Waiting...
INFO:listen_document:Notified: 2020-08-21T17:48:33.845741+00:00
INFO:listen_document:Change: ChangeType.ADDED
INFO:listen_document:Notified: 2020-08-21T17:49:12.715310+00:00
INFO:listen_document:Change: ChangeType.MODIFIED [ 1]
...
INFO:listen_collection:Notified: 2020-08-27T13:25:16.420286+00:00
INFO:listen_collection:Change: ChangeType.MODIFIED [ 239]
^CTraceback (most recent call last):
File "listen_collection.py", line 50, in <module>
main(parsed)
File "listen_collection.py", line 41, in main
callback_done.wait()
File "/opt/Python-3.8.1/lib/python3.8/threading.py", line 558, in wait
signaled = self._cond.wait(timeout)
File "/opt/Python-3.8.1/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
KeyboardInterrupt
$ venv/bin/python update_document.py --count=100 --interval=10
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/100]
INFO:update_document:Sleeping: 9.080
...
INFO:update_document:Updating: repro_gcf_18/watch_me [ 100/100]
$ venv/bin/python update_document.py --count=100 --interval=7
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/100]
INFO:update_document:Sleeping: 7.919
...
INFO:update_document:Updating: repro_gcf_18/watch_me [ 100/100]
This one updates a new, separate document, in three batches of ten updates:
$ venv/bin/python ./update_document.py --document waaa --count=10
INFO:update_document:Updating: repro_gcf_18/waaa [ 1/10]
INFO:update_document:Sleeping: 10.344
...
INFO:update_document:Updating: repro_gcf_18/waaa [ 10/10]
$ venv/bin/python ./update_document.py --document waaa --count=10
INFO:update_document:Updating: repro_gcf_18/waaa [ 1/10]
INFO:update_document:Sleeping: 10.368
...
INFO:update_document:Updating: repro_gcf_18/waaa [ 10/10]
$ venv/bin/python ./update_document.py --document waaa --count=10
INFO:update_document:Updating: repro_gcf_18/waaa [ 1/10]
INFO:update_document:Sleeping: 9.737
...
INFO:update_document:Updating: repro_gcf_18/waaa [ 10/10]
The "off-by-one" in the listener count happens on the first run only, because the waaa
document doesn't yet exist in the collection. There is a corresponding INFO:listen_document:Change: ChangeType.ADDED
entry in the listener log.
Next repro attempt: I just tried with the listen_query.py
script from PR #158. E.g.:
INFO:listen_query:Watching: ('repro_gcf_18',) ('now >= 2020-08-27T13:45:50.359648
INFO:listen_query:Waiting...
INFO:listen_query:Notified: 2020-08-27T13:45:50.769171+00:00
INFO:listen_query:Notified: 2020-08-27T13:45:58.116840+00:00
INFO:listen_query:Change: ChangeType.ADDED
INFO:listen_query:Notified: 2020-08-27T13:46:00.364694+00:00
INFO:listen_query:Change: ChangeType.MODIFIED [ 1]
INFO:listen_query:Notified: 2020-08-27T13:46:06.793776+00:00
INFO:listen_query:Change: ChangeType.MODIFIED [ 2]
INFO:listen_query:Notified: 2020-08-27T13:46:08.215903+00:00
INFO:listen_query:Change: ChangeType.MODIFIED [ 3]
INFO:listen_query:Notified: 2020-08-27T13:46:08.795545+00:00
INFO:listen_query:Change: ChangeType.ADDED
INFO:listen_query:Notified: 2020-08-27T13:46:13.402602+00:00
INFO:listen_query:Change: ChangeType.MODIFIED [ 4]
^CTraceback (most recent call last):
File "listen_query.py", line 54, in <module>
main(parsed)
File "listen_query.py", line 45, in main
callback_done.wait()
File "/opt/Python-3.8.1/lib/python3.8/threading.py", line 558, in wait
signaled = self._cond.wait(timeout)
File "/opt/Python-3.8.1/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
KeyboardInterrupt
$ venv/bin/python update_document.py --count=100 --interval=10
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/100]
INFO:update_document:Sleeping: 9.952
...
INFO:update_document:Updating: repro_gcf_18/watch_me [ 100/100]
$ venv/bin/python update_document.py --count=10 --interval=7
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/10]
INFO:update_document:Sleeping: 6.303
...
INFO:update_document:Updating: repro_gcf_18/watch_me [ 10/10]
$ venv/bin/python update_document.py --count=90 --interval=7
INFO:update_document:Updating: repro_gcf_18/watch_me [ 1/90]
INFO:update_document:Sleeping: 7.138
INFO:update_document:Updating: repro_gcf_18/watch_me [ 90/90]
$ venv/bin/python ./update_document.py --document waaa --count=10
INFO:update_document:Updating: repro_gcf_18/waaa [ 1/10]
INFO:update_document:Sleeping: 10.805
...
INFO:update_document:Updating: repro_gcf_18/waaa [ 10/10]
Again, the off-by-two count in the listener is due to the documents not (initially) matching the query, with the corresponding ChangeType.ADDED
entries as they are update to match.
@mr-bjerre I have added a --debug
command line option for the listen_*.py
scripts, which turns on the DEBUG
level log entries, e.g.:
$ venv/bin/python listen_query.py --debug
DEBUG:google.auth._default:Checking /home/tseaver/projects/agendaless/Google/abstract-disk-227418-b7d417dbf6af.json for explicit credentials as part of auth process...
DEBUG:google.auth._default:Checking /home/tseaver/projects/agendaless/Google/abstract-disk-227418-b7d417dbf6af.json for explicit credentials as part of auth process...
INFO:listen_query:Watching: ('repro_gcf_18',) ('now >= 2020-08-27T14:21:33.863320
DEBUG:google.api_core.bidi:Started helper thread Thread-ConsumeBidirectionalStream
INFO:listen_query:Waiting...
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.ADD
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: ADD
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.CURRENT
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: CURRENT
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:walk over add_changes
INFO:listen_query:Notified: 2020-08-27T14:21:34.108883+00:00
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: TargetChangeType.NO_CHANGE
DEBUG:google.cloud.firestore_v1.watch:on_snapshot: target change: NO_CHANGE
DEBUG:google.api_core.bidi:waiting for recv.
...
So it seems to have been resolved?
I've just had my own application running with version google-cloud-firestore=1.8.1
for a couple of days without the listener breaking. I see that 1.9.0 was just released so I'll try to redeploy with that and let you know if I see the listener breaking during the weekend
@mr-bjerre I don't know of any recent changes which would have fixed the issue. One possible candidate:
google-cloud-firestore 1.6.2
, 2020-02-03).Hmmm I had version 1.6.2 before though where I had the issue - so I implemented a workaround end of March. I can try to deploy with 1.6.2 again now though without any workaround and see if the issue is there. I guess I'll let it run with 1.9.0 for now and see if I see the bug in the next few days.
I guess I'll let it run with 1.9.0 for now and see if I see the bug in the next few days.
Thanks! Doing the "bisect" hunt would be helpful once we think we have the issue fixed, given that we don't believe it necessarily ever worked prior to 1.6.1
.
@AntonioLule What version of google-cloud-firestore
are you using when you experience the issue? Can you upgrade to 1.9.0
and see if that helps?
@ikendra, @MaticConradi, @juancruzmartino Can you please try updating to google-cloud-firestore 1.9.0
and see if recent changes have fixed the issue?
Running 1.9.0
. It seems like the issue is (almost?) entirely fixed. I'm think I saw this happen again a few days ago, but not entirely sure, so I'll keep an eye on this
Thanks @tseaver. I just upgraded to google-cloud-firestore==1.9.0
and I'm now running the version of my application which was displaying the problem consistently with google-cloud-firestore==1.6.1
. Allow me a day or two please - I will update you on my observations then.
Hi @tseaver, I unfortunately still see the same issue also with google-cloud-firestore==1.9.0
: after some time, the callback method specified as the parameter of on_shapshot() method stops being called. Everything worked fine for ca. 20 hours before the problem started.
I checked all I could/knew to check to rule out a mistake at my side:
I now added some extra logging to try and understand beter what's going on.
The same issue with google-cloud-firestore==1.9.0
again, this time after ca. 30 hours. The only difference compared to google-cloud-firestore==1.6.1
appears to be that the problem in my setup occurs after considerably longer time (<10 hours with 1.6.1 vs. 20/30 hours with 1.9.0 with the same code and very similar amount of data store operations per hour)
I've had my application running for 4 days and the listener is still working (no way of telling if it was broken at some point though). I'm on google-cloud-firestore==1.9.0
@ikendra Can you post the output from running pip list
in your environment? I'm wondering if you might have an older version of a dependency, or something. For comparison, here is the output in the environment where I've been trying to reproduce this bug:
$ venv/bin/pip list
Package Version Location
------------------------ --------- -------------------------------------------------------------
cachetools 4.1.1
certifi 2020.6.20
chardet 3.0.4
google-api-core 1.22.1
google-auth 1.20.1
google-cloud-core 1.4.1
google-cloud-firestore 2.0.0.dev1 /path/to/python-firestore
googleapis-common-protos 1.52.0
grpcio 1.31.0
idna 2.10
libcst 0.3.10
mypy-extensions 0.4.3
pip 20.2.2
proto-plus 1.7.1
protobuf 3.13.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pytz 2020.1
PyYAML 5.3.1
requests 2.24.0
rsa 4.6
setuptools 49.6.0
six 1.15.0
typing-extensions 3.7.4.2
typing-inspect 0.6.0
urllib3 1.25.10
wheel 0.35.1
I'm afraid my listener is broken now as well. Survived 4 days but not 5
@mr-bjerre Can you provide output of pip list
for that environment?
Package Version Location
---------------------------- --------- -------------
attrs 19.3.0
beautifulsoup4 4.8.2
blis 0.4.1
CacheControl 0.12.6
cachetools 4.1.1
catalogue 1.0.0
certifi 2020.6.20
cffi 1.14.2
chardet 3.0.4
click 7.1.2
cymem 2.0.3
fastapi 0.61.0
firebase-admin 4.3.0
google-api-core 1.22.1
google-api-python-client 1.10.0
google-auth 1.20.1
google-auth-httplib2 0.0.4
google-cloud-core 1.4.1
google-cloud-error-reporting 0.34.0
google-cloud-firestore 1.9.0
google-cloud-logging 1.15.1
google-cloud-pubsub 1.7.0
google-cloud-storage 1.31.0
google-crc32c 1.0.0
google-resumable-media 1.0.0
googleapis-common-protos 1.52.0
grpc-google-iam-v1 0.12.3
grpcio 1.31.0
gunicorn 20.0.4
h11 0.9.0
html5lib 1.0.1
httplib2 0.18.1
httptools 0.1.1
idna 2.10
jsonschema 3.2.0
lxml 4.5.0
more-itertools 8.4.0
msgpack 1.0.0
murmurhash 1.0.2
numpy 1.18.1
packaging 20.4
pandas 1.0.1
pip 20.2.2
plac 1.1.3
pluggy 0.13.1
plx-util 0.0.1 /opt/plx_util
preshed 3.0.2
protobuf 3.13.0
py 1.9.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pydantic 1.4
pyparsing 2.4.7
pyrsistent 0.16.0
pytest 5.3.5
python-dateutil 2.8.1
pytz 2020.1
requests 2.24.0
rsa 4.6
setuptools 49.3.1
six 1.15.0
soupsieve 2.0.1
spacy 2.3.2
srsly 1.0.2
starlette 0.13.6
thinc 7.4.1
tqdm 4.48.2
typer 0.0.8
uritemplate 3.0.1
urllib3 1.25.10
uvicorn 0.11.8
uvloop 0.14.0
wasabi 0.7.1
wcwidth 0.2.5
webencodings 0.5.1
websockets 8.1
wheel 0.34.2
xxhash 1.4.3
Yup my listener also breaks, though it does take longer before it happens.
Pip list:
Package Version
---------------------------- -----------
aiodns 1.1.1
aiohttp 3.6.2
async-timeout 3.0.1
attrs 20.1.0
base58 2.0.1
bitarray 1.2.2
cachetools 4.1.1
ccxt 1.33.93
certifi 2020.6.20
cffi 1.14.2
chardet 3.0.4
click 7.1.2
cryptography 3.1
cytoolz 0.10.1
dblpy 0.3.4
decorator 4.4.2
discord.py 1.4.1
eth-abi 2.1.1
eth-account 0.5.3
eth-hash 0.2.0
eth-keyfile 0.5.1
eth-keys 0.3.3
eth-rlp 0.2.0
eth-typing 2.2.2
eth-utils 1.9.5
futures 3.1.1
google-api-core 1.22.1
google-assistant-grpc 0.2.1
google-assistant-sdk 0.6.0
google-auth 1.21.0
google-auth-oauthlib 0.4.1
google-cloud-core 1.4.1
google-cloud-error-reporting 1.0.0
google-cloud-firestore 1.9.0
google-cloud-logging 1.15.1
google-cloud-storage 1.31.0
google-cloud-vision 1.0.0
google-crc32c 1.0.0
google-resumable-media 1.0.0
googleapis-common-protos 1.52.0
grpcio 1.31.0
hexbytes 0.2.1
idna 2.10
iexfinance 0.4.3
ipfshttpclient 0.6.1
jsonschema 3.2.0
libcst 0.3.10
lru-dict 1.1.6
multiaddr 0.0.9
multidict 4.7.6
mypy-extensions 0.4.3
netaddr 0.8.0
numpy 1.19.1
oauthlib 3.1.0
pandas 1.1.1
parsimonious 0.8.1
pathlib2 2.3.5
pip 20.2.2
proto-plus 1.9.0
protobuf 3.13.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycares 3.1.1
pycoingecko 1.3.0
pycparser 2.20
pycryptodome 3.9.8
pyOpenSSL 19.1.0
pyrsistent 0.16.0
python-dateutil 2.8.1
python-telegram-bot 12.8
pytz 2020.1
PyYAML 5.3.1
pyzmq 19.0.2
ratelimiter 1.2.0.post0
requests 2.24.0
requests-oauthlib 1.3.0
rlp 2.0.0a1
rsa 4.6
rusty-rlp 0.1.15
setuptools 49.1.2
six 1.15.0
sounddevice 0.3.15
stripe 2.51.0
tenacity 4.12.0
toolz 0.10.0
tornado 6.0.4
typing-extensions 3.7.4.3
typing-inspect 0.6.0
urllib3 1.25.10
varint 1.0.2
web3 5.12.1
websockets 8.1
wheel 0.34.2
yarl 1.5.1
May be a tall ask, but do you have logs to match the failure? That has been the sticking point for fixing this: we know something is causing failures for users, but aren't aware of what that failure is.
@crwilcox the general consensus I believe has been that there are no logs produced by this issue whatsoever. I have all function code enclosed inside try except blocks with Google Error reporting integrated, and there just isn't anything related to it coming up, not even network related errors, nothing.
It also looks like the issue has been coming up more frequently for me lately, but this could be because of higher number of changes listeners had to have been handling on my end.
@tseaver is your repro WIP PR capturing the failure at this point?
@MaticConradi if you have the ability could you try 2.0.0dev1
? I am hoping to release it in the next ~30 days, but have it in dev as it also brings along some new async surface and wanted a chance to experiment first.
The underlying protobuffer code is generated differently also so it is possible it has an effect.
@crwilcox sure, I can look into migrating to v2 and seeing if that fixes it. Is there a list of known issues (if any) with existing functionality that I should be on the lookout for?
I am not aware of issues, but there are some changes that are breaking. Particularly the way params are taken to some methods to protect against future API changes causing usability issues/breaks.
We have some scripts you can run on a codebase to move to this as well: https://github.com/googleapis/python-firestore/tree/master/scripts
Depending on what paths you use this could vary in the amount of change.
Doesn't look like the issue is fixed in 2.0.0dev1
, I can confirm the snapshot listeners break after a few hours with no error messages
Btw. I imagine that this is related; https://github.com/googleapis/nodejs-firestore/issues/1023#issuecomment-702840701
@crwilcox
Is your repro WIP PR capturing the failure at this point?
Nope.
@BenWhitehead, @crwilcox I'm not sure what is to be done here, beyond constructing the scripts to (attempt) reproducing this issue in #158.
@tseaver I also don't know what can be done to get a more firm repro. Anyone who is experiencing this, if you could try installing the candidate builds for v2.0.0 it is possible the work done to overhaul things a bit may have squashed this bug.
Also, if anyone has suggestions on how to reproduce the failure we would be interested to try to track this down.
Closing this issue because it looks like v2.0.0 (released Nov 10, 2020) may have fixed it. If anyone experiences this behavior with v2.0 or above going forward, please open a new issue. Thanks!
Environment details
google-cloud-firestore
version: 1.6.1Steps to reproduce
I have a python script that sets up a listener to new documents in a firestore collection using
on_snapshot
on a query. To keep the script alive I have an infinite while loop in the end. Everything works fine but after several hours it stops receiving new documents (or the callback is not executed at least) - yesterday it was after ~7 hours.In an earlier version of
google-cloud-firestore
(1.5.0) there was a related issue where the listener got disconnected after 1 hour. I made my own ugly hack back then but it worked at least:Code example (sample)
Since the 1 hour issue was fixed I now do
and then execute a script like
Stack trace
Unfortunately no stack trace since the listener just stops receiving messages.
I see this as the core feature of Firestore so hopefully it'll get the right attention right away (some of our core business is to trigger python applications on news from firestore so we are extremely dependent on this).