Open tswast opened 3 years ago
Here's a stacktrace from a Googler who tried to reproduce this on their own project.
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-75-e46c7b68e71a>", line 12, in load_data
job = load_job.result() # Waits for the job to complete.
File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/job/base.py", line 679, in result
return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/google/api_core/future/polling.py", line 134, in result
raise self._exception
google.api_core.exceptions.Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas
"""
The above exception was the direct cause of the following exception:
Forbidden Traceback (most recent call last)
<ipython-input-77-bef363ce70e2> in <module>
1 with multiprocessing.Pool() as pool:
----> 2 pool.map(load_data, args)
/opt/conda/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):
/opt/conda/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas
Indeed the exception does throw from result()
. It might be nice to see the structured error data to help with our retry predicate though.
Having this exact problem in a cloud function triggered when data is uploaded to cloud bucket. Having job_retry
argument to load_table_from_uri
will definitely be very useful.
Right now, considering cloud function retry option but I plan to add monitoring on top of cloud function and want to keep logs clean for that even if retry was successful.
So now implementing exponential backoff in case of exception.
In internal issue 195911158, a customer is struggling to retry jobs that fail with "403 Exceeded rate limits: too many table update operations for this table". One can encounter this exception by attempting to run hundreds of load jobs in parallel.
Thoughts:
result()
orload_table_from_uri()
? Ifresult()
, continue withjob_retry
, otherwise see if we can modify the default retry predicate forload_table_from_uri()
to find this rate limiting reason and retry.result()
, modify load jobs (or more likely the base class) to retry if job retry is set, similar to what we do for query jobs.Notes:
job_retry
object forload_table_from_uri()
, as the retryable reasons will likely be different than what we have for queries.load_table_from_*
are as retryable asload_table_from_uri()
, since they would require rewinding file objects, which isn't always possible. We'll probably want to consider addingjob_retry
to those load job methods in the future, but for nowload_table_from_uri
is what's needed.