googleapis / python-bigquery

Apache License 2.0
746 stars 306 forks source link

query result pagination caught in infinite loop #1950

Closed ethan-oro closed 5 months ago

ethan-oro commented 5 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

Steps to reproduce

  1. run query
  2. query.result(...) called with page_size and start_index loops infinitely

i think 3.24.0 introduced this, as when we pinned google-cloud-bigquery to 3.23.1, the issue abated.

for what it's worth, with 3.24.0 each call to query_job.result returns much faster than in 3.23.1 -- not sure if that's perf related or because it's not actually making the network call

Code example

from typing import List
import logging

from google.cloud import bigquery

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    client = bigquery.Client()

    result_size = 100
    page_size = 10

    query_str = (f"SELECT * FROM `project-id.dataset-id.table-id` LIMIT {result_size}",)

    job_config = bigquery.QueryJobConfig(use_query_cache=True)
    job_config.query_parameters = []

    query_job = client.query(
        query=query_str,
        location="US",
        job_config=job_config,
    )

    start_index = 0

    while True:
        page_rows: bigquery.table.RowIterator = query_job.result(
            max_results=page_size, start_index=start_index
        )
        logging.info(
            "Retrieved result set from query page of size [%d] starting at index [%d]",
            page_size,
            start_index,
        )

        num_rows_read = 0
        processed_rows: List[bigquery.table.Row] = []
        for row in page_rows:
            num_rows_read += 1
            processed_rows.append(row)

        logging.info(
            "Processed [%d] rows from query page starting at index [%d]",
            num_rows_read,
            start_index,
        )
        if num_rows_read == 0:
            break

        if start_index >= 1000:
            break

        start_index += num_rows_read
        logging.info("Processed [%d] rows...", start_index)

Logs

INFO:root:Retrieved result set from query page of size [10] starting at index [0]
INFO:root:Processed [10] rows from query page starting at index [0]
INFO:root:Processed [10] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [10]
INFO:root:Processed [10] rows from query page starting at index [10]
INFO:root:Processed [20] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [20]
INFO:root:Processed [10] rows from query page starting at index [20]
INFO:root:Processed [30] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [30]
INFO:root:Processed [10] rows from query page starting at index [30]
INFO:root:Processed [40] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [40]
INFO:root:Processed [10] rows from query page starting at index [40]
INFO:root:Processed [50] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [50]
INFO:root:Processed [10] rows from query page starting at index [50]
INFO:root:Processed [60] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [60]
INFO:root:Processed [10] rows from query page starting at index [60]
INFO:root:Processed [70] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [70]
INFO:root:Processed [10] rows from query page starting at index [70]
INFO:root:Processed [80] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [80]
INFO:root:Processed [10] rows from query page starting at index [80]
INFO:root:Processed [90] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [90]
INFO:root:Processed [10] rows from query page starting at index [90]
INFO:root:Processed [100] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [100]
INFO:root:Processed [10] rows from query page starting at index [100]
INFO:root:Processed [110] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [110]
INFO:root:Processed [10] rows from query page starting at index [110]
INFO:root:Processed [120] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [120]
INFO:root:Processed [10] rows from query page starting at index [120]
INFO:root:Processed [130] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [130]
INFO:root:Processed [10] rows from query page starting at index [130]
INFO:root:Processed [140] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [140]
INFO:root:Processed [10] rows from query page starting at index [140]
INFO:root:Processed [150] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [150]
INFO:root:Processed [10] rows from query page starting at index [150]
INFO:root:Processed [160] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [160]
INFO:root:Processed [10] rows from query page starting at index [160]
INFO:root:Processed [170] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [170]
INFO:root:Processed [10] rows from query page starting at index [170]
INFO:root:Processed [180] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [180]
INFO:root:Processed [10] rows from query page starting at index [180]
INFO:root:Processed [190] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [190]
INFO:root:Processed [10] rows from query page starting at index [190]
INFO:root:Processed [200] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [200]
INFO:root:Processed [10] rows from query page starting at index [200]
INFO:root:Processed [210] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [210]
INFO:root:Processed [10] rows from query page starting at index [210]
INFO:root:Processed [220] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [220]
INFO:root:Processed [10] rows from query page starting at index [220]
INFO:root:Processed [230] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [230]
INFO:root:Processed [10] rows from query page starting at index [230]
INFO:root:Processed [240] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [240]
INFO:root:Processed [10] rows from query page starting at index [240]
INFO:root:Processed [250] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [250]
INFO:root:Processed [10] rows from query page starting at index [250]
INFO:root:Processed [260] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [260]
INFO:root:Processed [10] rows from query page starting at index [260]
INFO:root:Processed [270] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [270]
INFO:root:Processed [10] rows from query page starting at index [270]
INFO:root:Processed [280] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [280]
INFO:root:Processed [10] rows from query page starting at index [280]
INFO:root:Processed [290] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [290]
INFO:root:Processed [10] rows from query page starting at index [290]
INFO:root:Processed [300] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [300]
INFO:root:Processed [10] rows from query page starting at index [300]
INFO:root:Processed [310] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [310]
INFO:root:Processed [10] rows from query page starting at index [310]
INFO:root:Processed [320] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [320]
INFO:root:Processed [10] rows from query page starting at index [320]
INFO:root:Processed [330] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [330]
INFO:root:Processed [10] rows from query page starting at index [330]
INFO:root:Processed [340] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [340]
INFO:root:Processed [10] rows from query page starting at index [340]
INFO:root:Processed [350] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [350]
INFO:root:Processed [10] rows from query page starting at index [350]
INFO:root:Processed [360] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [360]
INFO:root:Processed [10] rows from query page starting at index [360]
INFO:root:Processed [370] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [370]
INFO:root:Processed [10] rows from query page starting at index [370]
INFO:root:Processed [380] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [380]
INFO:root:Processed [10] rows from query page starting at index [380]
INFO:root:Processed [390] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [390]
INFO:root:Processed [10] rows from query page starting at index [390]
INFO:root:Processed [400] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [400]
INFO:root:Processed [10] rows from query page starting at index [400]
INFO:root:Processed [410] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [410]
INFO:root:Processed [10] rows from query page starting at index [410]
INFO:root:Processed [420] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [420]
INFO:root:Processed [10] rows from query page starting at index [420]
INFO:root:Processed [430] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [430]
INFO:root:Processed [10] rows from query page starting at index [430]
INFO:root:Processed [440] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [440]
INFO:root:Processed [10] rows from query page starting at index [440]
INFO:root:Processed [450] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [450]
INFO:root:Processed [10] rows from query page starting at index [450]
INFO:root:Processed [460] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [460]
INFO:root:Processed [10] rows from query page starting at index [460]
INFO:root:Processed [470] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [470]
INFO:root:Processed [10] rows from query page starting at index [470]
INFO:root:Processed [480] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [480]
INFO:root:Processed [10] rows from query page starting at index [480]
INFO:root:Processed [490] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [490]
INFO:root:Processed [10] rows from query page starting at index [490]
INFO:root:Processed [500] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [500]
INFO:root:Processed [10] rows from query page starting at index [500]
INFO:root:Processed [510] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [510]
INFO:root:Processed [10] rows from query page starting at index [510]
INFO:root:Processed [520] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [520]
INFO:root:Processed [10] rows from query page starting at index [520]
INFO:root:Processed [530] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [530]
INFO:root:Processed [10] rows from query page starting at index [530]
INFO:root:Processed [540] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [540]
INFO:root:Processed [10] rows from query page starting at index [540]
INFO:root:Processed [550] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [550]
INFO:root:Processed [10] rows from query page starting at index [550]
INFO:root:Processed [560] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [560]
INFO:root:Processed [10] rows from query page starting at index [560]
INFO:root:Processed [570] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [570]
INFO:root:Processed [10] rows from query page starting at index [570]
INFO:root:Processed [580] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [580]
INFO:root:Processed [10] rows from query page starting at index [580]
INFO:root:Processed [590] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [590]
INFO:root:Processed [10] rows from query page starting at index [590]
INFO:root:Processed [600] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [600]
INFO:root:Processed [10] rows from query page starting at index [600]
INFO:root:Processed [610] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [610]
INFO:root:Processed [10] rows from query page starting at index [610]
INFO:root:Processed [620] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [620]
INFO:root:Processed [10] rows from query page starting at index [620]
INFO:root:Processed [630] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [630]
INFO:root:Processed [10] rows from query page starting at index [630]
INFO:root:Processed [640] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [640]
INFO:root:Processed [10] rows from query page starting at index [640]
INFO:root:Processed [650] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [650]
INFO:root:Processed [10] rows from query page starting at index [650]
INFO:root:Processed [660] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [660]
INFO:root:Processed [10] rows from query page starting at index [660]
INFO:root:Processed [670] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [670]
INFO:root:Processed [10] rows from query page starting at index [670]
INFO:root:Processed [680] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [680]
INFO:root:Processed [10] rows from query page starting at index [680]
INFO:root:Processed [690] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [690]
INFO:root:Processed [10] rows from query page starting at index [690]
INFO:root:Processed [700] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [700]
INFO:root:Processed [10] rows from query page starting at index [700]
INFO:root:Processed [710] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [710]
INFO:root:Processed [10] rows from query page starting at index [710]
INFO:root:Processed [720] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [720]
INFO:root:Processed [10] rows from query page starting at index [720]
INFO:root:Processed [730] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [730]
INFO:root:Processed [10] rows from query page starting at index [730]
INFO:root:Processed [740] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [740]
INFO:root:Processed [10] rows from query page starting at index [740]
INFO:root:Processed [750] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [750]
INFO:root:Processed [10] rows from query page starting at index [750]
INFO:root:Processed [760] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [760]
INFO:root:Processed [10] rows from query page starting at index [760]
INFO:root:Processed [770] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [770]
INFO:root:Processed [10] rows from query page starting at index [770]
INFO:root:Processed [780] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [780]
INFO:root:Processed [10] rows from query page starting at index [780]
INFO:root:Processed [790] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [790]
INFO:root:Processed [10] rows from query page starting at index [790]
INFO:root:Processed [800] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [800]
INFO:root:Processed [10] rows from query page starting at index [800]
INFO:root:Processed [810] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [810]
INFO:root:Processed [10] rows from query page starting at index [810]
INFO:root:Processed [820] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [820]
INFO:root:Processed [10] rows from query page starting at index [820]
INFO:root:Processed [830] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [830]
INFO:root:Processed [10] rows from query page starting at index [830]
INFO:root:Processed [840] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [840]
INFO:root:Processed [10] rows from query page starting at index [840]
INFO:root:Processed [850] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [850]
INFO:root:Processed [10] rows from query page starting at index [850]
INFO:root:Processed [860] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [860]
INFO:root:Processed [10] rows from query page starting at index [860]
INFO:root:Processed [870] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [870]
INFO:root:Processed [10] rows from query page starting at index [870]
INFO:root:Processed [880] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [880]
INFO:root:Processed [10] rows from query page starting at index [880]
INFO:root:Processed [890] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [890]
INFO:root:Processed [10] rows from query page starting at index [890]
INFO:root:Processed [900] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [900]
INFO:root:Processed [10] rows from query page starting at index [900]
INFO:root:Processed [910] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [910]
INFO:root:Processed [10] rows from query page starting at index [910]
INFO:root:Processed [920] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [920]
INFO:root:Processed [10] rows from query page starting at index [920]
INFO:root:Processed [930] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [930]
INFO:root:Processed [10] rows from query page starting at index [930]
INFO:root:Processed [940] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [940]
INFO:root:Processed [10] rows from query page starting at index [940]
INFO:root:Processed [950] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [950]
INFO:root:Processed [10] rows from query page starting at index [950]
INFO:root:Processed [960] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [960]
INFO:root:Processed [10] rows from query page starting at index [960]
INFO:root:Processed [970] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [970]
INFO:root:Processed [10] rows from query page starting at index [970]
INFO:root:Processed [980] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [980]
INFO:root:Processed [10] rows from query page starting at index [980]
INFO:root:Processed [990] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [990]
INFO:root:Processed [10] rows from query page starting at index [990]
INFO:root:Processed [1000] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [1000]
INFO:root:Processed [10] rows from query page starting at index [1000]
...
<many more lines here>
...
INFO:root:Processed [10] rows from query page starting at index [17350]
INFO:root:Processed [17360] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17360]
INFO:root:Processed [10] rows from query page starting at index [17360]
INFO:root:Processed [17370] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17370]
INFO:root:Processed [10] rows from query page starting at index [17370]
INFO:root:Processed [17380] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17380]
INFO:root:Processed [10] rows from query page starting at index [17380]
INFO:root:Processed [17390] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17390]
INFO:root:Processed [10] rows from query page starting at index [17390]
INFO:root:Processed [17400] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17400]
INFO:root:Processed [10] rows from query page starting at index [17400]
INFO:root:Processed [17410] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17410]
INFO:root:Processed [10] rows from query page starting at index [17410]
INFO:root:Processed [17420] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17420]
INFO:root:Processed [10] rows from query page starting at index [17420]
INFO:root:Processed [17430] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17430]
INFO:root:Processed [10] rows from query page starting at index [17430]
INFO:root:Processed [17440] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [17440]
<this is where ctrl-c'd>

expect output is

INFO:root:Retrieved result set from query page of size [10] starting at index [0]
INFO:root:Processed [10] rows from query page starting at index [0]
INFO:root:Processed [10] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [10]
INFO:root:Processed [10] rows from query page starting at index [10]
INFO:root:Processed [20] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [20]
INFO:root:Processed [10] rows from query page starting at index [20]
INFO:root:Processed [30] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [30]
INFO:root:Processed [10] rows from query page starting at index [30]
INFO:root:Processed [40] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [40]
INFO:root:Processed [10] rows from query page starting at index [40]
INFO:root:Processed [50] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [50]
INFO:root:Processed [10] rows from query page starting at index [50]
INFO:root:Processed [60] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [60]
INFO:root:Processed [10] rows from query page starting at index [60]
INFO:root:Processed [70] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [70]
INFO:root:Processed [10] rows from query page starting at index [70]
INFO:root:Processed [80] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [80]
INFO:root:Processed [10] rows from query page starting at index [80]
INFO:root:Processed [90] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [90]
INFO:root:Processed [10] rows from query page starting at index [90]
INFO:root:Processed [100] rows...
INFO:root:Retrieved result set from query page of size [10] starting at index [100]
INFO:root:Processed [0] rows from query page starting at index [100]
Linchin commented 5 months ago

Thanks for reporting this issue! Indeed we made a recent perf related change https://github.com/googleapis/python-bigquery/pull/1942, where we skip a get job call if page_size or max_results is set and directly get the first page result. I think the infinite loop was cause by this change - the client will repeatedly retrieve the first page despite start_index is set. I will fix it when I'm back to work tomorrow.

But just out of curiosity, could you tell me more about why you are using a while loop to get the query result, instead of just for row in result? This was indeed a use case we didn't account for.

ethan-oro commented 5 months ago

thank you so much for the quick reply! i think we wanted to limit the number of records read into memory at a given time for big tables (table has millions of rows, wanted to keep no more than 10k in memory at a time). that being said, this can also be achieved just using page_size and then looping for row in result in the manner you suggested above and yielding results every page_size. if that is the recommended approach for a use-case like this, we can switch our implementation over to use that logic instead

Linchin commented 5 months ago

Yeah, I feel like while(True) is generally avoided except some very specific use cases (but I guess judgements like this are always debatable). Regardless there's still a bug to be fixed :)