rapidsai / cudf

cuDF - GPU DataFrame Library

https://docs.rapids.ai/api/cudf/stable/

Apache License 2.0

8.43k stars 903 forks source link

Add `peak_memory_usage` to all nvbench benchmarks #10528

Open vyasr opened 2 years ago

vyasr commented 2 years ago

Update (2023-12-05): Adding peak_memory_usage to nvbenchmarks can be accomplished with this pattern:

auto const mem_stats_logger = cudf::memory_stats_logger(); 

state.exec( ... );

state.add_buffer_size(
    mem_stats_logger.peak_memory_usage(), 
    "peak_memory_usage", 
    "peak_memory_usage"
);

Please consult the cuIO benchmarks for how to add peak memory tracking. If we are tracking peak memory usage as well as bytes per second, then we can estimate memory footprint across the libcudf API.

Original issue:

7770 added support for peak memory usage to cuIO benchmarks using rmm's `statistics_resource_adapter`. It would be nice to be able to expand that to all of our benchmarks so that we could more easily detect regressions in memory usage. This would be particularly useful for the Dask cuDF team, which is always looking to identify bottlenecks from memory usage. There was already discussion of doing this in #7770, so we should investigate following up now.

vyasr commented 2 years ago

CC @galipremsagar who was interested in this data.

@devavret it looks like you made this very easy with the memory_stats_logger? @harrism would you still be in favor of this?

harrism commented 2 years ago

@harrism would you still be in favor of this?

I don't have a reason not to support this. However it may not benefit the utility of all benchmarks. Does it impact CI benchmark throughput?

jrhemstad commented 2 years ago

I am reluctant to add this new functionality based on Google Benchmark when we are trying to phase that out.

I would support adding this as a feature to benchmarks ported to NVBench.

vyasr commented 2 years ago

@PointKernel I notice that you're already doing this in #10248 with NVBench benchmarks. It looks like it's no more complex than incorporating the resource adapter into GBench benchmarks?

PointKernel commented 2 years ago

It looks like it's no more complex than incorporating the resource adapter into GBench benchmarks?

Right. It's effortless with the help of memory_stats_logger. Only two lines to be added:

auto mem_stats_logger = cudf::memory_stats_logger(); // init stats logger
state.exec([&](nvbench::launch& launch) {
  target_kernel();
});
state.add_element_count(mem_stats_logger.peak_memory_usage(), "Peak Memory"); // report peak memory usage to nvbench

vyasr commented 2 years ago

So then maybe making this change is independent of whether a benchmark has been switch from GBench to NVBench? It seems like we could add this now, then when a project switches from GBench to NVBench the only required change are to switch stat.add_element_count( for state.counters["peak_memory_usage"].

jrhemstad commented 2 years ago

So then maybe making this change is independent of whether a benchmark has been switch from GBench to NVBench?

There's also the parsing and reporting side that will be different.

I don't want to keep building things on top of GBench when I'm actively trying to get people to switch to NVbench.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

vyasr commented 5 months ago

@GregoryKimball do you think this is worth doing systematically for all benchmarks, or something we can add on a case by case basis as the need arises? If the latter, I would vote that we close this issue and just add this when we are benchmarking specific functionality.