nithinmurali / pygsheets

Google Sheets Python API v4
https://pygsheets.readthedocs.io/en/latest
Other
1.51k stars 220 forks source link

Question: Is this batching pattern correct? #501

Closed cuchoi closed 3 years ago

cuchoi commented 3 years ago

Hi! First of all, thank you for this amazing library :) — It is a life saver!

I have been having some issue batching requests and found a pattern that it seems to work, but wanted to confirm this is the right way of batching value and formatting updates.

I am modifying thousands of cells with this pattern:

  1. Create cell using worksheet object
  2. Unlink cell
  3. Modify cell object (both formatting and its value)
  4. Add cell to a list [Repeat steps 1 to 4 for thousands of cells]
  5. Use the wks.update_cells function on the list

Here is the code of the same pattern:

    headers = ["Example1", "Example2", "Example3"]
    current_row = 1
    for h in headers:
        cell = wks.cell(f"A{current_row}")
        cell.unlink()
        cell.color = (0.5, 0.5 , 0.5, 1)
        cell = cell.set_text_format('bold', True)
        cell = cell.set_value(h)

        cells_to_update.append(cell)
        current_row += 1

     wks.update_cells(cells_to_update)

I tried using set_batch_mode but I run into random issues. Is this because I am updating values and the batch mode indicates that it "(...) only caches sheetUpdate requests not value updates or clear requests"?

Is my current pattern an efficient way of batching requests?

Thanks in advance!

nithinmurali commented 3 years ago

Hi, This would work and should give the same performance as using batches. But batching should also work here. example code

    headers = ["Example1", "Example2", "Example3"]
    current_row = 1
   gc.set_batch_mode(True)

    for h in headers:
        cell = Cell((1,current_row))  # directly inited cells are unlined by default, you can send tuple(row ,col) for address
        cell.color = (0.5, 0.5 , 0.5, 1)
        cell.set_text_format('bold', True)  # no need to assign to cell
        cell.set_value(h)
        cell.link(wks)
        cell.update()  # I should propably allow linking directly here by taking wks as a param
        current_row += 1

     gc.run_batch()

Ideally batching should work without unlinking. But it's not mature yet. it's in the roadmap.

cuchoi commented 3 years ago

Thank you! Now I am curious: What does the comment "(...) only caches sheetUpdate requests not value updates or clear requests" in the set_batch_mode function mean?

nithinmurali commented 3 years ago

It means currently

gc.set_batch_mode(True)

cell = wks.cell('A1')
cell.color = (0.5, 0.5 , 0.5, 1) # this is batched
cell.set_text_format('bold', True)  # this is batched
cell.set_value(h) # this wont be batched

gc.run_batch()

The setting cell value calls a different api, so is not batched. In the code I provided, we are circumventing it by using cell.update(). similarly, wks.update_values, wks.update_row, wks.clear are not batched.

nithinmurali commented 3 years ago

As you can see thats what stopping from using batching without unlinking cells. Once it's implemented unlinking will be deprecated.

cuchoi commented 3 years ago

That makes sense! Thank you. Quick question: On the first code that you shared you call cell.update(), wouldn't that execute a request as soon as it is called? Wouldn't using "wks.update_cells()" would be better?

nithinmurali commented 3 years ago

since the batch _mode is set to true cell.update() wont be executed and will only be executed when the batch is run. For update_cells you have to maintain a list of cells, here you can avoid that. (though internally both have same effect). your code looks good, I just provided a sample using batch_mode for any future users. Both will have the same run time.