modelfoxdotdev / modelfox

ModelFox makes it easy to train, deploy, and monitor machine learning models.
Other
1.46k stars 63 forks source link

Data representation, preprocessing and interpretation. #75

Closed m-kru closed 2 years ago

m-kru commented 2 years ago

I have some data, where one of columns represents day time in following format: hours:minutes, for example 22:50. I was wondering whether I should transform the data into minutes since midnight, for example 1370.

This question can further more generalized. How does tangram treat input data values? Does tangram automatically convert data into integer or real number whenever possible, or maybe everything is treated as a string?

isabella commented 2 years ago

Hi @m-kru, right now Tangram does not have native support for dates. We require the user to transform dates into reasonable features. Transforming your data into minutes since midnight sounds like a great idea.

Tangram infers the column types if you do not specify them in the config file. If you're curious, this is the code that pertains to inferring the column type:

match self.column_type {
    InferColumnType::Unknown | InferColumnType::Number => {
        if fast_float::parse::<f32, &str>(value)
            .map(|v| v.is_finite())
        .unwrap_or(false)
        {
                    self.column_type = InferColumnType::Number;
        } else if self.unique_values.is_some() {
            self.column_type = InferColumnType::Enum;
        } else {
            self.column_type = InferColumnType::Text;
        }
    }
    InferColumnType::Enum => {
        if self.unique_values.is_none() {
                self.column_type = InferColumnType::Text;
        }
    }
    _ => {}
}

https://github.com/tangramdotdev/tangram/blob/ced26a87425c3f263363a1c772953358385ba61c/crates/table/load.rs