neomatrix369 / awesome-ai-ml-dl

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Other
1.49k stars 353 forks source link

add data analysis + features engineering on ames daset #3

Closed jcharlet closed 5 years ago

neomatrix369 commented 5 years ago

A few comments on this PR:

jcharlet commented 5 years ago

Hi mani, Sorry didn't put comments but I made a long comment on slack. My answers below:

On Tue, 23 Apr 2019, 22:14 mani, notifications@github.com wrote:

A few comments on this PR:

  • where is the Data Preparatory notebook for Ames Housing? Was it meant to be part of this PR?

Not included at this stage, I didn't have time to do it yet, I wanted to see if features selection would be more obvious with this dataset, and it is. But I guarantee we'll have way more opportunities to do data prep. The notebook on features engineering we have is purely dedicated to text and categorical columns, which this dataset has.

Right, will do

  • I wouldn't add any data files (.cvs,.txt etc...) to the git-repo its best to download them from somewhere else or from their original source

Yeah I know and agree, I didn't refactor my work, tried to move on as quickly as possible. Regarding. Txt, one of them is my own analysis of the labels description. Need to move it back into the notebook but it's verbose..

  • purpose of train.csv and test.csv - again best to download them from elsewhere or prepare them on the fly using python commands, etc...

Agree, to correct. Might be directly available in keras or scikit integrated datasets. I'll check

  • how big are these data files in total?

Not big a couple hundreds kb from what I remember

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neomatrix369/awesome-ai-ml-dl/pull/3#issuecomment-485956924, or mute the thread https://github.com/notifications/unsubscribe-auth/AANLKYSPBTCJFL63MJX3X5TPR5UY5ANCNFSM4HH53ZEQ .