This integration collects telemetry from Databricks (including Spark on Databricks) and/or Spark telemetry from any Spark deployment. See the Features section for supported telemetry types.
When listing all clusters at the start of the spark collection, we use ListAll instead of List, which does not allow for pagination. So we are missing clusters. Need to use List with the listing.Iterator instead.
When we do this, it will significantly increase the time of PollMetrics. But we have to pull the list each time in case clusters have changed state or have been terminated, or new ones started. What should we do to account for this? Can we use a filter when we call List?
We could also add the option to specify a list of cluster IDs instead of automatic discovery.
When listing all clusters at the start of the spark collection, we use
ListAll
instead ofList
, which does not allow for pagination. So we are missing clusters. Need to useList
with thelisting.Iterator
instead.When we do this, it will significantly increase the time of
PollMetrics
. But we have to pull the list each time in case clusters have changed state or have been terminated, or new ones started. What should we do to account for this? Can we use a filter when we callList
?We could also add the option to specify a list of cluster IDs instead of automatic discovery.