Open spullara opened 2 years ago
hi @spullara. You should definitely be able to configure the max unique values so that your column with 117 unique values would be an enum column. Currently, the only way to do that is to pass a config file with the column name, type, and a list of all of the variants. There are two potential implementations that would achieve what you want:
#[derive(Clone)]
pub struct FromCsvOptions<'a> {
pub column_types: Option<BTreeMap<String, TableColumnType>>,
pub infer_options: InferOptions,
pub invalid_values: &'a [&'a str],
}
impl<'a> Default for FromCsvOptions<'a> { fn default() -> FromCsvOptions<'a> { FromCsvOptions { column_types: None, infer_options: InferOptions::default(), invalid_values: DEFAULT_INVALID_VALUES, } } }
pub struct InferOptions { pub enum_max_unique_values: usize, }
impl Default for InferOptions { fn default() -> InferOptions { InferOptions { enum_max_unique_values: 100, } } }
2. Allow passing the column name and type but not force the user to pass the all unique variants in a list.
I think option 2 is probably closer to the interface might be looking for? This way you get to configure the type per column but don't have to pass all of the variants (which for enums with high numbers of options is cumbersome).
I think just labelling the column an enum without having to list the values would be great.
I am getting text search instead of an enum by default for a column that has 117 unique values (out of the 18k or so samples provided).