jkmackie / car_price_prediction

CraigsList vehicle price prediction with scikit-learn and pandas
0 stars 0 forks source link

Complete Project #1

Open erolrecep opened 5 years ago

erolrecep commented 5 years ago

Why don't we have a project that does every features on the tensorflow?

We can do all these steps for the easiest CNN algorithm, LeNet5 with MNIST dataset.

jkmackie commented 5 years ago

Hi Recep,

Sounds good! Are you okay with me doing these things with 2.0 tensorflow? If there is a problem using the 2.0 API, I think it would be on me to figure out how to use the 2.0 API.

The 2.0 book is now expected end of October.

Thanks! Justin

On Mon, Oct 21, 2019 at 5:00 PM Recep notifications@github.com wrote:

Why don't we have a project that does every features on the tensorflow?

  • tf-record
  • data augmentation
  • training (later on hypterparameter-optimization)
  • monitoring training with tensorboard.
  • inference/validation while one set of data is being trained.
  • visualize result of validation set on tensorboard.
  • save best model weights and put into a web application.

We can do all these steps for the easiest CNN algorithm, LeNet5 with MNIST dataset.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jkmackie/tensorflow2/issues/1?email_source=notifications&email_token=AJNEUWWZHZXKKWM7U6H6JSLQPYRA5A5CNFSM4JDFW6CKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTK2S3A, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNEUWUNZRJKE7KLV7PW5E3QPYRA5ANCNFSM4JDFW6CA .

jkmackie commented 5 years ago

car_pricing_v2 is uploaded to Github. It replaces both v1 AND explore_json.

Project Scope: I did regression models on Honda for Sale By OWNER in Houston. There are 315 samples -- 70% train and 30% test (stratified by model Ex. Accord, CR-V, Civic).

The main modeling issue is we need more data. Here are proposed ways to get more:

  1. Pick for sale by all (owner or dealer) rather than just owner. Dealer listings are most common.
  2. Combine regions. For example, combine Houston with College Station and Galveston?
  3. Pick a more common manufacturer like Ford. But, will each manufacturer model be more common?

If we do (1), for sale by Owner vs Dealer will need to be a feature. This can be parsed from the vehicle url (Ex. houston.craigslist.org/cto vs houston.craigslist.org/ctd). Alternatively, we could do for sale by Dealer only, which is more common than Owner.

If we do (2), any regional pricing differences will be commingled.

If we do (3), I'll need to update the data scrubbing pipeline to see if Ford models are more common than Honda models.

There are 3,000 Fords and 1,307 Hondas for sale by owner/dealer in Houston. This is the route I'm inclined to try. We can always dump owner or dealer listings.