open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
839 stars 402 forks source link

Add BatchSpanProcessor option to set CPU affinity on the worker thread #1822

Open bjosv opened 1 year ago

bjosv commented 1 year ago

Is your feature request related to a problem? We would like to use the batch processor in a system where other system parts requires low latency and high throughput. Today this is obtained by running the processes in a strict fashion and pinning threads to CPUs.

When using the batch processor it starts a worker thread which runs on any CPU, which gives degraded performance.

Describe the solution you'd like We preferably would like to use the provided batch processor and one proposal would be that we add an CPU-affinity-mask option to the BatchSpanProcessorOptions.

When this option is configured we set the given cpu affinity using std::thread::native_handle on worker_thread_ and then set the mask via pthread_setaffinity_np(). This could possibly be a Linux only feature, but Windows has something similar via SetProcessAffinityMask().

Describe alternatives you've considered

Would a PR for a feature like this possibly be accepted? Are there other options that I haven't yet considered?

lalitb commented 1 year ago

Thanks for raising the issue. In general, would be nice to avoid native platform specific API calls within the SDK. But feel free to raise a PR to discuss further. Another options could be to optionally pass some thread-factory handle through BatchSpanProcessorOptions, and let it generate the thread with required affinity.

lalitb commented 1 year ago

Probably I was not very clean with thread-factory/ thread-pool approach. In simplest form for illustration, it can be something as below, and then the application can pass the custom thread-pool implementation by inheriting ThreadPool class. In real world, it would be bit more complex, as we may have to manage synchronization, result from threads etc.

class ThreadPool {
public:
  virtual void Schedule(std::function<void(void)> func)  = 0;

};

class DefaultThreadPool : public ThreadPool {
public:
    void Schedule(std::function<void(void)> func) override
    {
        threads.push_back(std::thread(func));

    }
    ~DefaultThreadPool() {
        for (auto &thread: threads){
            if (thread.joinable()) {
                thread.join();
            }
        }
    }
private:
  std::vector<std::thread> threads;
};

struct BatchSpanProcessorOptions
{
    std::unique_ptr<ThreadPool> pool = std::unique_ptr<DefaultThreadPool> (new DefaultThreadPool);
};

class BatchSpanProcessor{
    public:
        BatchSpanProcessor(const BatchSpanProcessorOptions &options)
        {
            options.pool->Schedule([this](){this->DoBackgroundWork();});
        }

        void DoBackgroundWork() {

        }
};

This is more generalized solution to support application which have their own thread-pool, along with addressing current issue. But still would be good to have draft PR with affinity changes in current implementation to discuss and compare further.

bjosv commented 1 year ago

Thanks for the example. This seem like a much better approach than what I had in mind. I will do some experiments with it, and hopefully prepare a draft PR for further discussions.

github-actions[bot] commented 1 year ago

This issue was marked as stale due to lack of activity.

github-actions[bot] commented 1 year ago

This issue was marked as stale due to lack of activity.

lalitb commented 1 year ago

This needs to be supported.