Open GregoryKimball opened 1 week ago
--rmm_mode managed
while running the benchmark, the benchmark will run using management memory for entire benchmark (both data generation and the query). we could consider use managed memory for data generation only, we should add options to these query benchmarks. generate_parquet_data_sources(scale_factor, table_names, dest_sources)
. Individual columns in each table are not selectable now. Often a column generation is dependent on other columns. So, we will end up creating more columns.We could create managed memory for data generation use it and destroy after writing the parquet data to host. Use this result for queries. But remember, host to device transfer is included as part of scan (parquet read) in benchmark time as well. update API to accept cuio_source_sink_pair
Thank you @karthikeyann for your comments.
rmm_mode
MR for the query, that seems like a good pattern.In the end I would like to be able to run SF100 with CUDA async MR on A100. If the data gen uses managed MR and the timed queries use async MR, that would work great.
Is your feature request related to a problem? Please describe. In the NDS-H-cpp benchmarks, the memory footprint of data generation is larger than the memory footprint of query execution. This ends up limiting us to <=SF10 on H100 GPUs. Perhaps as much as 10x smaller than we can go with pre-generated files.
Describe the solution you'd like There are a few solutions we could use:
write_to_parquet_device_buffer
)Additional context On A100, we can run query sizes up to SF100 or so, but the generator only goes to ~SF10.