rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.46k stars 908 forks source link

[FEA] Separate datasource from reader parameters in cuIO #17159

Open ttnghia opened 1 month ago

ttnghia commented 1 month ago

Currently, for reading various file formats, we do this:

auto opts = cudf::io::xxx_reader_options::builder(cudf::io::source_info{input_data_source_info})
    .some_options(...)
    ...
    .build();

auto output = read_xxx(opts);

When we have a large number of options to specify and have different datasources, it would be very burdensome to set the parameters repeatedly for every datasource.

We can avoid doing so by separating datasource from reading options. By doing so, we can just set the reading parameters once, then reuse the options instance multiple times:

auto opts = cudf::io::xxx_reader_options::builder(cudf::io::source_info{input_data_source_info})
    .some_options(...)
    ...
    .build();

auto output1 = read_xxx(source1, opts);
auto output2 = read_xxx(source2, opts);
....
ttnghia commented 1 month ago

CC @vuule @karthikeyann.