prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.04k stars 5.37k forks source link

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

Open karteekmurthys opened 3 months ago

karteekmurthys commented 3 months ago

PrometheusStatsReporter updates worker stats synchronously. This is fine as most of the stats are updated through PeriodicTaskManager in a separate thread periodically without blocking queries.

But this specific metric is updated in the query path and is updated way too frequently.

        RECORD_HISTOGRAM_METRIC_VALUE(
            kCounterHttpClientPrestoExchangeOnBodyBytes, bufferBytes);
      });

The Q67 of SF10K TPCDS workload takes several hours to finish since we merged PrometheusStatsReporter. We are working on how to improve this in the stats reporter but we should also avoid reporting metrics in the query path if possible.

Request to review if presto_cpp.http.client.presto_exchange_source.on_body_bytes can be removed or made optional for tracking.

We tested Q67 at SF10K by removing this metric and the query completes successfully.

We also have a workaround of updating histogram asynchronously in PrometheusStatsReporter. This solution was tested successfully as well against Q67

karteekmurthys commented 3 months ago

cc: @majetideepak @amitkdutta

karteekmurthys commented 3 months ago

@pramodsatya gathered some stats on total time spent at different locations in Presto CPP where RECORD_HISTOGRAM is called. The Q67 query was run for over an hour and PrestoExchangeSource.cpp:117 took total of 3256278906 us in an hour of query run.

reg/add   function                               line num
Add       addRootPool                            Memory.cpp:248                          76
          clearThread                            Driver.h:153                        166326
          operator()                             ExchangeClient.cpp:163               87166
                                                 ExchangeClient.cpp:167               61760
                                                 ExchangeClient.cpp:170               31759
                                                 FileHandle.cpp:64                       45
                                                 PrestoExchangeSource.cpp:117    3256278906
          runInternal                            Driver.cpp:522                      348599
          updatePrestoExchangeSourceMemoryStats  PeriodicTaskManager.cpp:268          18196
          ~MemoryPoolImpl                        MemoryPool.cpp:462                      45
          ~ScopedArbitration                     SharedArbitrator.cpp:880              1110
                                                 SharedArbitrator.cpp:910                12
jaystarshot commented 3 months ago

What if the implementation instead created a subprocess like the exposer mentioned https://github.com/jupp0r/prometheus-cpp. Resources can be controlled when starting the exposer. In uber we use the exposer as a separate process since we don't have to worry about reporting/maintainng the new http server

jaystarshot commented 3 months ago

I don't think this will help directly here^ since I could repro this internally.