metatron-app / metatron-discovery

Powerful & Easy way for big data discovery
https://metatron.app
Apache License 2.0
441 stars 112 forks source link

Apply type auto guessing feature to datasource ingestion #2157

Open joohokim1 opened 5 years ago

joohokim1 commented 5 years ago

Is your feature request related to a problem? Please describe. When I import a CSV file in dataprep, the column types are automatically set by appropriate rules. I wish the datasource ingestion have the same behavior.

Describe the solution you'd like Add the same feature. (hopefully reuse the data preparation's code) Moreover, the role of the columns (dim/measure) might be set automatically too. I suggest that the numeric columns to be measures by default.

IMPORTANT : Mis-guessing is not a big pain for dataprep because you can change the rule or add another rule to fix it, but for datasource ingestion, it could be troublesome, for example, if the data have lots of number-like codes. So I suggest a checkbox that controls the auto type conversion feature.

Describe alternatives you've considered After implementing the metadata management feature (TBD), the types in the metadata should override the guess.

Additional context Related code is in PrepDatasetFileService.getResponseMapFromCsv().

kyungtaak commented 5 years ago

@joohokim1 I would like to move the common guessing logic in PrepDatasetFileService.getResponseMapFromCsv() to the common package.

joohokim1 commented 5 years ago

@kyungtaak I'll find the way when we work on this issue later.