udacity / build-ml-pipeline-for-short-term-rental-prices

Project code for cd0581 refresh taught by Giacomo Vianello
Other
14 stars 633 forks source link

Initial setup and EDA project steps #41

Closed dhedderich closed 1 year ago

dhedderich commented 1 year ago

The following changes have been made:

  1. Added markdown cells to explain the purpose of the notebook and make it understandable.
  2. Updated the main.py script to obtain a sample of the data. The pipeline also uploads the data to Weights & Biases.
  3. Executed the EDA step using the command mlflow run src/eda. This command installs Jupyter and the necessary dependencies for pandas-profiling and opens a Jupyter notebook instance.
  4. In the Jupyter notebook, added code to fetch the artifact (sample.csv) from Weights & Biases and read it using pandas.
  5. Utilized ydata-profiling to create a profile of the dataset and display it using interactive widgets.
  6. Provided guidance on what to observe during the EDA process, such as identifying missing values, data format issues, and outliers in the price column.
  7. Included code to drop outliers based on a specified price range and convert the last_review column to datetime format.