Closed Kayrnt closed 2 months ago
Yes, I addressed this problem in the documentation. The current approach is to either isolate on a specific dataset or you can even use a direct scan on tables without ATTACH-ing. I'm not sure if a full lazy approach is practical or even doable as many operations already require the table schema being set before query execution. Finally, as there are no complaints yet, I would rather stick to keep it simple here ;)
'm not sure if a full lazy approach is practical or even doable as many operations already require the table schema being set before query execution.
I'd cache whatever I could and read the least possible from APIs + parallelize calls 😅
Finally, as there are no complaints yet, I would rather stick to keep it simple here ;)
I guess hardly anyone is aware of the extension yet 😉 But I'm already complaining if you need any excuse 😄
Right now, if we consider we want to query
my_project.my_dataset1.my_table2
, then the current behavior is to:my_project
(1 single remote call)my_dataset1
(N remote calls for a dataset with N tables)Then all those calls are sequential, so it can be pretty slow (like few minutes for datasets with hundreds of tables).
A lazy approach could be to just try to read the entries required by just doing a single
GetTable
call on tables involved.