A casual dataset flows view that lists about 10 flows runs for ~1.36s, and performs highly ineffecient repository access operations.
(see Grafana trace)
There are over 7000 spans, including numerous access to get_active_polling_source for the very same dataset (the only one). Internally this is causing a lot of metadata chain iteration activity, reading multiple S3 files, then re-using the cached version.
Possible solutions:
general improvement of SetPollingSource access (via database materialization or summary extensions)
improving how flow GraphQL objects are organized, so that dataset query is issued only once for N flows
In addition, the same trace in Grafana uncovered need in #850
A casual dataset flows view that lists about 10 flows runs for ~1.36s, and performs highly ineffecient repository access operations. (see Grafana trace)
There are over 7000 spans, including numerous access to
get_active_polling_source
for the very same dataset (the only one). Internally this is causing a lot of metadata chain iteration activity, reading multiple S3 files, then re-using the cached version.Possible solutions:
SetPollingSource
access (via database materialization or summary extensions)In addition, the same trace in Grafana uncovered need in #850