rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.42k stars 901 forks source link

[FEA] Separate datasource from reader parameters in cuIO #17159

Open ttnghia opened 1 week ago

ttnghia commented 1 week ago

Currently, for reading various file formats, we do this:

auto opts = cudf::io::xxx_reader_options::builder(cudf::io::source_info{input_data_source_info})
    .some_options(...)
    ...
    .build();

auto output = read_xxx(opts);

When we have a large number of options to specify and have different datasources, it would be very burdensome to set the parameters repeatedly for every datasource.

We can avoid doing so by separating datasource from reading options. By doing so, we can just set the reading parameters once, then reuse the options instance multiple times:

auto opts = cudf::io::xxx_reader_options::builder(cudf::io::source_info{input_data_source_info})
    .some_options(...)
    ...
    .build();

auto output1 = read_xxx(source1, opts);
auto output2 = read_xxx(source2, opts);
....
ttnghia commented 1 week ago

CC @vuule @karthikeyann.