paypal / load-watcher

Load watcher is a cluster-wide aggregator of metrics, developed for Trimaran: Real Load Aware Scheduler in Kubernetes.
Other
65 stars 34 forks source link

Cannot use the LibraryClient with the watcher #18

Closed wangchen615 closed 3 years ago

wangchen615 commented 3 years ago

The current design of libraryClient in the release version includes watcher as follows:

// Client for Watcher APIs as a library
type libraryClient struct {
    fetcherClient watcher.MetricsProviderClient
    watcher       *watcher.Watcher
}

Watcher libraries should only be used for load-watcher. libraryClient should not include starting the watcher when the libraryClient is also used by scheduler plugins.

// Creates a new watcher client when using watcher as a library
func NewLibraryClient(opts watcher.MetricsProviderOpts) (Client, error) {
    var err error
    client := libraryClient{}
    switch opts.Name {
    case watcher.PromClientName:
        client.fetcherClient, err = metricsprovider.NewPromClient(opts)
    case watcher.SignalFxClientName:
        client.fetcherClient, err = metricsprovider.NewSignalFxClient(opts)
    default:
        client.fetcherClient, err = metricsprovider.NewMetricsServerClient()
    }
    if err != nil {
        return client, err
    }

    client.watcher = watcher.NewWatcher(client.fetcherClient)
    client.watcher.StartWatching()
    return client, nil
}

The LibraryClient failed to getLatestMetrics when using it in plugins without select {}.

zorro786 commented 3 years ago

Issue is with fetches below from windowWatcher() which run in separate go routine:

    for _, duration := range durations {
        go windowWatcher(duration)
    }

This may cause the metrics to be unavailable for a very short moment initially when using load watcher as a library. To fix this, the first fetch should be made synchronously, and subsequent fetches should use goroutines. Thanks for reporting @wangchen615