Closed ethan-oro closed 5 months ago
Thanks for reporting this issue! Indeed we made a recent perf related change https://github.com/googleapis/python-bigquery/pull/1942, where we skip a get job call if page_size
or max_results
is set and directly get the first page result. I think the infinite loop was cause by this change - the client will repeatedly retrieve the first page despite start_index is set. I will fix it when I'm back to work tomorrow.
But just out of curiosity, could you tell me more about why you are using a while loop to get the query result, instead of just for row in result
? This was indeed a use case we didn't account for.
thank you so much for the quick reply! i think we wanted to limit the number of records read into memory at a given time for big tables (table has millions of rows, wanted to keep no more than 10k in memory at a time). that being said, this can also be achieved just using page_size
and then looping for row in result
in the manner you suggested above and yielding results every page_size
. if that is the recommended approach for a use-case like this, we can switch our implementation over to use that logic instead
Yeah, I feel like while(True)
is generally avoided except some very specific use cases (but I guess judgements like this are always debatable). Regardless there's still a bug to be fixed :)
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
If you are still having issues, please be sure to include as much information as possible:
Environment details
python --version
3.11.6
pip --version
23.3.1
google-cloud-bigquery
version:pip show google-cloud-bigquery
3.24.0
Steps to reproduce
query.result(...)
called withpage_size
andstart_index
loops infinitelyi think
3.24.0
introduced this, as when we pinnedgoogle-cloud-bigquery
to3.23.1
, the issue abated.for what it's worth, with
3.24.0
each call toquery_job.result
returns much faster than in3.23.1
-- not sure if that's perf related or because it's not actually making the network callCode example
Logs
expect output is