Closed legaultmarc closed 8 years ago
I started refactoring to use the cohort_manager.inference module for such things (will be pushed soonish). The interface for the REPL could look like this:
> import csv my_file.csv delim=',' header=0
# Found 5 columns, verify the following information, then press enter:
[
{"name": "Name", "variable_type": None},
{"name": "Age", "variable_type": "continuous"},
{"name": "Height", "variable_type": "continuous"}
{"name": "Tall", "variable_type": "discrete", "parent": "height"},
{"name": "FavoriteWeather", "variable_type": "factor"},
]
Users could also add the other meta fields (e.g. {"icd10": ...}
). In this example, I also correctly inferred the parent relationship between "Tall" and "Height". This will be a lot harder in practice.
The current solution of having a YAML configuration file that gets parsed to build the database really isn't scalable and would restrict the use of the tool.
It would be best to define a set of commands for the REPL (and/or future GUI implementations) that would facilitate importing data from common formats (e.g. CSV).
These commands should then allow semi-automatic importation and infer data types as well as phenotype structure automatically.