Open petervandivier opened 9 months ago
Back-of-the-envelope math suggests 4gb batch sizes if you want to target 15 min run times per-batch.
Predictor function should allow for user input to set a custom run time (remembering the 60 min hard cap with a buffer).
Maybe steer clear of estimate_data_size() & stick to .show extents. estimate_data_size(*)
appears to read the entire table into memory - which isn't super surprising in retrospect :sweat_smile:
Uneven data distribution hurts queue throughput (#5) and can cause batch failure if a batch exceeds 60 minutes runtime.
Get a baseline export size and use it to predict what batch sizes are appropriate for a given data range using estimate_data_size() or similar.