rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.09k stars 874 forks source link

[FEA] need multibyte_split support stream #14383

Open weedge opened 8 months ago

weedge commented 8 months ago

Is your feature request related to a problem? Please describe. want to use cuda multi stream (pool) read a big file.

Describe the solution you'd like just add a new multibyte_split method with rmm::cuda_stream_view stream param to public use.

head: https://github.com/rapidsai/cudf/blob/branch-23.12/cpp/include/cudf/io/text/multibyte_split.hpp

std::unique_ptr<cudf::column> multibyte_split(
  data_chunk_source const& source,
  std::string const& delimiter,
  parse_options options               = {},
  rmm::cuda_stream_view stream = cudf::get_default_stream(),
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

src: https://github.com/rapidsai/cudf/blob/branch-23.12/cpp/src/io/text/multibyte_split.cu

std::unique_ptr<cudf::column> multibyte_split(cudf::io::text::data_chunk_source const& source,
                                              std::string const& delimiter,
                                              parse_options options,
                                              rmm::cuda_stream_view stream,
                                              rmm::mr::device_memory_resource* mr)
{
  auto result = detail::multibyte_split(
    source, delimiter, options.byte_range, options.strip_delimiters, stream, mr);

  return result;
}

Describe alternatives you've considered

Additional context

bdice commented 8 months ago

Related: #13744. cc: @vuule @shrshi