vanderbilt-ml / 51-callahan-mlproj-realestate

0 stars 0 forks source link

Basic Exploratory Analysis (EDA) #7

Closed peter-callahan closed 2 years ago

peter-callahan commented 2 years ago

Questions that should be addressed during EDA:

  1. Are the enough data points to perform a prediction in the geographical area of interest?
  2. What data cleaning tasks are necessary?
  3. Is missing data a problem for any particular columns?
  4. Do any basic patterns emerge that increase/decrease trust in the dataset?
peter-callahan commented 2 years ago

Are the enough data points to perform a prediction in the geographical area of interest?

Yes, we have enough data to begin, and can collect more as needed.

What data cleaning tasks are necessary?

Data conversions, replacing NANs and strings, and strategically dropping some categories/columns. Dropping foward looking values is important to avoiding leakage.

Is missing data a problem for any particular columns?

Yes, key missing values for sqft, beds, bathrooms pose a risk. Will need to be cautious with filling those.

Do any basic patterns emerge that increase/decrease trust in the dataset?

None that cannot be handled with data cleaning or by expanding the dataset.