sacdallago / dataminer

Apache License 2.0
1 stars 0 forks source link

Project selection #1

Closed sacdallago closed 8 years ago

sacdallago commented 8 years ago

Please write three projects that you would take for the course-project. You can use minus+space (- ) to form lists :tada:

ripitrust commented 8 years ago
  1. Airbnb New Customer
  2. Yelp Restaurant Classification
  3. WInton Stock Market Challenge
mbarbera commented 8 years ago
JanEr93 commented 8 years ago
sacdallago commented 8 years ago
sacdallago commented 8 years ago

@ripitrust @mbarbera @JanEr93 Seems like we have a tie:

I'll keep out from voting so it's 3 people and there is definetly gonna be a winner! :) You can decide

sacdallago commented 8 years ago

@ripitrust @mbarbera @JanEr93 :sleeping: :joy:

ripitrust commented 8 years ago

@sacdallago Sorry for the late response, I was having a bad fever these days

My preferences of these two:

  1. Yelp restaurant
  2. Restaurant revenue

But I guess the restaurant revenue will be easy to implement.

mbarbera commented 8 years ago
JanEr93 commented 8 years ago

I definitely prefer Restaurant revenue. I think that it would be easier to implement (at least for me since I have never worked with picture data before) and the task seems doable with some regression, data clearance, outlier analysis, ....

sacdallago commented 8 years ago

@janer93 unfortunately it seems like the yelp story won from @ripitrust and @mbarbera 's comments.

I kept out of favouring so we have a clear winner.

Image data is not very different from any other type of data. You just have to upsample or downsaple the images to have equally long vectors which contain the color depth at each pixel (after all, images are just matrices and a computer doesn't really care about dimensions, so feeding a flattened, vectorized matrix instead of a 1024x1024 matrix will be just fine :) )

Also: I have not had a look on the data here, but I guess a big improvement for the model would be adding new data items (aka: take pictures from flickr and label them into a new training dataset), adding geolocation and look for some prior knowledge (people in Germany like Italian food, so likelihood that food is italian is high, etc)