googleapis / python-firestore

Apache License 2.0
218 stars 75 forks source link

Connection to firestore is getting lost after sometime with a message about "invalid authentication credentials". #130

Closed skywritergr closed 2 years ago

skywritergr commented 4 years ago

Environment details

Steps to reproduce

This is a very weird one to reproduce. What is happening is that I deploy my docker container to Google Cloud Run and for some time everything works fine. I do successfully connect to the Firestore and retrieve data. After some point though things stop working and I get a Exception: 401 Request had invalid authentication credentials.

So very broad steps would be

  1. Deploy a container in Google Cloud run
  2. Execute some requests to Firestore. They should be successful
  3. Wait some time (the container will go into hibernate as that point if you don't have enough traffic)
  4. Throw some more requests to Firestore. Watch it fail.

Code example

from firebase_admin import firestore, initialize_app, credentials

cred = credentials.ApplicationDefault()
initialize_app(cred, {
  'projectId': 'my-project-id-1234',
})

db = firestore.client()

def function_A:
 ...
 data = db.collection(u'data').where(u'title', u'==', name).stream()
 ...

def function_B:
 ....

Stack trace

Exception: 401 Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.

I have been trying to get to the bottom of this but I honestly don't know where the error is. Some assumptions:

I would really appreciate any pointers.

crwilcox commented 4 years ago

Hi @skywritergr

When you say 'some time' before the credentials cease working, how long is that? Minutes, an hour, a day?

skywritergr commented 4 years ago

In the latest example I can see a request working as expected at: 2020-09-07 10:26:24.804 BST and then in the logs I can see the first batch of failures at 2020-09-07 17:13:07.161 BST. So I would say definitely hours.

tseaver commented 4 years ago

@skywritergr Can you try updating google-cloud-firestore to version 1.9.0? Also, can you show the output of pip list in your environment?

skywritergr commented 4 years ago

Thanks for your reply @tseaver, I have now updated to version 1.9.0.

Here is what I get after doing pip3 list (for what is worth, and not sure if it changes anything, I use poetry to handle the dependencies of my project)

Package            Version   
------------------ ----------
astroid            2.3.3     
attrs              19.3.0    
autopep8           1.4.4     
CacheControl       0.12.6    
cachy              0.3.0     
certifi            2020.4.5.1
chardet            3.0.4     
cleo               0.7.6     
clikit             0.4.3     
docutils           0.16      
html5lib           1.0.1     
idna               2.9       
importlib-metadata 1.1.3     
isort              4.3.4     
jsonschema         3.2.0     
keyring            20.0.1    
lazy-object-proxy  1.4.3     
lockfile           0.12.2    
mccabe             0.6.1     
msgpack            1.0.0     
pastel             0.2.0     
pexpect            4.8.0     
pip                20.0.2    
pkginfo            1.5.0.1   
poetry             1.0.5     
ptyprocess         0.6.0     
pycodestyle        2.5.0     
pylev              1.3.0     
pylint             2.4.3     
pyparsing          2.4.7     
pyrsistent         0.14.11   
requests           2.23.0    
requests-toolbelt  0.8.0     
rstcheck           3.3.1     
setuptools         46.0.0    
shellingham        1.3.2     
six                1.13.0    
tomlkit            0.5.11    
typed-ast          1.4.0     
urllib3            1.25.9    
virtualenv         16.2.0    
webencodings       0.5.1     
wheel              0.34.2    
wrapt              1.11.2    
zipp               3.1.0 
tseaver commented 4 years ago

Hmm, pip3 list doesn't show anything related to google-cloud-* at all. Can you show the equivalent under (maybe poetry export -f requirements.txt? I'm not familiar with it).

skywritergr commented 4 years ago

The pyproject.toml, which is the equivalent of the package.json for the frontend, has the following dependancies.

[tool.poetry.dependencies]
python = "^3.7"
tmdbsimple = "^2.2.0"
flask = "^1.1.2"
requests = "^2.23.0"
beautifulsoup4 = "^4.9.0"
flask_cors = "^3.0.8"
requests_cache = "^0.5.2"
simplejson = "^3.17.0"
firebase-admin = "^4.4.0"
uwsgi = "^2.0.18"

[tool.poetry.dev-dependencies]
pytest = "^4.6"
flake8 = "^3.7.9"
mypy = "^0.770"
black = "^19.10b0"

The requirements.txt exports gives me the following list: https://pastebin.com/hLCRTFpY

skywritergr commented 4 years ago

I thought that the problem might have solved itself by updating the library. It was working fine for a few days now. Today it happened again. Same error as before.

What I was doing:

Exception: 401 Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
zerotrickpony commented 3 years ago

I also see this problem with the python client. I have several long-running download scripts, and they fail after one hour with this error. I assume that the token is expiring? I am going to try automatically expiring and re-creating the client every 30 minutes and see if that helps...

paulharter commented 3 years ago

I have a wrapper I use around the Firestore client that automatically does the oauth refresh dance.

It does a bunch of other stuff you probably don't want/need but you can see the auth workflow here

https://github.com/paulharter/fam/blob/master/src/fam/database/firestore.py

crwilcox commented 3 years ago

One thing stood out: Cloud Run has a max time of 3600 seconds (an hour), and @skywritergr you state it was fine for an hour. But I'd imagine the request would time out at that point. Or are the requests smaller requests over the time period? The sample included is a streaming call I assumed was very large.

Also, when docker pause was tried local, how long did you leave it paused?

skywritergr commented 3 years ago

I admit that I paused it for maybe a couple of minutes. Not longer than that. Also the request isn't very large at all. The data are actually the IMDB file document ingested into firebase. So the Python request is a streaming call into firebase that searches for a movie.

It's the dataset from here: https://www.imdb.com/interfaces/ (the ratings one I think)

crwilcox commented 3 years ago

@skywritergr is that how long it is pausing in cloud run? Or is the pause there >60 minutes?

skywritergr commented 3 years ago

Oh definitely more than 60 minutes on cloud run. This is a very low traffic website that might get 3-4 searches a day.

crwilcox commented 3 years ago

I think auth needs to be refreshed every 60 minutes, and I believe what you are observing are the credentials used by the stream going stale.

skywritergr commented 3 years ago

I saw your message in the other issue. Is the working assumption that the token refresh is happening in the middle of a streaming event and that fails to complete? I can see it happening somewhere in the beginning.

busunkim96 commented 3 years ago

Hi @skywritergr if it's not too much trouble, could you enable gRPC logging and let us know what you see?

GRPC_TRACE=plugin_credentials
GRPC_VERBOSITY=DEBUG

https://github.com/grpc/grpc/blob/master/TROUBLESHOOTING.md https://github.com/grpc/grpc/blob/master/doc/environment_variables.md

This was suggested by a gRPC engineer for tracking down the root cause: https://groups.google.com/g/grpc-io/c/5rOhCcqyHng

busunkim96 commented 3 years ago

Hi - could the original author (@skywritergr) or a repo maintainer look this over please? I am not very familiar with Firestore and may have ended up making some incorrect assumptions.

Re-reading the original post it sounds like each end user request results in a new, separate call to firestore. Is this correct, or does the .stream() call stay alive even when the container goes idle?

If this sounds like the correct flow of events, this looks very similar to https://github.com/googleapis/python-api-core/issues/223.

Notes for reproducing:

To make sure the container goes into "idle mode" instead of shutting down, min-instances must be at least 1. (see docs).

tseaver commented 2 years ago

@skywritergr, a couple of observations.

meredithslota commented 2 years ago

We don't have enough info to continue debugging this, so I'm going to close it out. If the original author or anyone else is seeing an issue like this, can you open a new issue with repro steps? Thank you!