Closed jcharlet closed 5 years ago
Hi mani, Sorry didn't put comments but I made a long comment on slack. My answers below:
On Tue, 23 Apr 2019, 22:14 mani, notifications@github.com wrote:
A few comments on this PR:
- where is the Data Preparatory notebook for Ames Housing? Was it meant to be part of this PR?
Not included at this stage, I didn't have time to do it yet, I wanted to see if features selection would be more obvious with this dataset, and it is. But I guarantee we'll have way more opportunities to do data prep. The notebook on features engineering we have is purely dedicated to text and categorical columns, which this dataset has.
- Good work preserving the work for Boston Housing and adding a new Dataset
https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/README-details.md#notebooks will need amending with the new changes (if we wish to mention links to both the Boston & Ames housing notebooks here)
Right, will do
- I wouldn't add any data files (.cvs,.txt etc...) to the git-repo its best to download them from somewhere else or from their original source
Yeah I know and agree, I didn't refactor my work, tried to move on as quickly as possible. Regarding. Txt, one of them is my own analysis of the labels description. Need to move it back into the notebook but it's verbose..
- purpose of train.csv and test.csv - again best to download them from elsewhere or prepare them on the fly using python commands, etc...
Agree, to correct. Might be directly available in keras or scikit integrated datasets. I'll check
- how big are these data files in total?
Not big a couple hundreds kb from what I remember
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neomatrix369/awesome-ai-ml-dl/pull/3#issuecomment-485956924, or mute the thread https://github.com/notifications/unsubscribe-auth/AANLKYSPBTCJFL63MJX3X5TPR5UY5ANCNFSM4HH53ZEQ .
A few comments on this PR:
.cvs
,.txt
etc...) to the git-repo its best to download them from somewhere else or from their original sourcetrain.csv
andtest.csv
- again best to download them from elsewhere or prepare them on the fly using python commands, etc...