mbernico / snape

Snape is a convenient artificial dataset generator that wraps sklearn's make_classification and make_regression and then adds in 'realism' features such as complex formating, varying scales, categorical variables, and missing values.
Apache License 2.0
165 stars 21 forks source link

Added functions to the make_dataset.py file to generate a simple star… #5

Closed scollins83 closed 7 years ago

scollins83 commented 7 years ago

Added functions to the make_dataset.py file to generate a simple star schema set of csv files, with the primary dataset table serving as a fact table and any categorical variables serving as dimensions with corresponding lookup tables. All other files modified in this commit were changed to include the addition of the star_schema option now available for configuration.

NOTE: As of yet, testing set has not been updated to test the star schema function.

tgsmith61591 commented 7 years ago

I saw there were some conflicts in Sara's PR and the CI wouldn't run until resolved. I merged master into Sara's branch, resolved the conflict in setup.py (modules available in the default python dist don't need to be in requirements, btw) and now the CI should be chugging along.

scollins83 commented 7 years ago

Thanks Taylor--- finishing up the tests now.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+5.9%) to 77.097% when pulling 2018f3bf2959bcf283cf6440e581a652655860e9 on scollins83:star_schema_csv into dbd2fde0a048b5fd46a07cf0fe24b0b4c0f64632 on mbernico:master.

mbernico commented 7 years ago

Thank you @scollins83 This is a great patch that future students will most certainly hate.