newrelic-experimental / newrelic-databricks-integration

This integration collects telemetry from Databricks (including Spark on Databricks) and/or Spark telemetry from any Spark deployment. See the Features section for supported telemetry types.
Apache License 2.0
2 stars 0 forks source link

Not reporting on all Spark clusters #36

Open sdewitt-newrelic opened 2 months ago

sdewitt-newrelic commented 2 months ago

When listing all clusters at the start of the spark collection, we use ListAll instead of List, which does not allow for pagination. So we are missing clusters. Need to use List with the listing.Iterator instead.

When we do this, it will significantly increase the time of PollMetrics. But we have to pull the list each time in case clusters have changed state or have been terminated, or new ones started. What should we do to account for this? Can we use a filter when we call List?

We could also add the option to specify a list of cluster IDs instead of automatic discovery.