Project selection - Githubissues

sacdallago commented 8 years ago

Please write three projects that you would take for the course-project. You can use minus+space (- ) to form lists :tada:

ripitrust commented 8 years ago

Airbnb New Customer
Yelp Restaurant Classification
WInton Stock Market Challenge

mbarbera commented 8 years ago

Restaurant Revenue Prediction
San Francisco Crime Classification
Yelp Restaurant Photo Classification

JanEr93 commented 8 years ago

Restaurant Revenue Prediction
BNP Paribas Cardif Claims Management
San Francisco Crime Classification

sacdallago commented 8 years ago

West Nile Virus Prediction
Yelp Restaurant Photo Classification
Restaurant Revenue Prediction

sacdallago commented 8 years ago

@ripitrust @mbarbera @JanEr93 Seems like we have a tie:

Yelp Restaurant Photo Classification
Restaurant Revenue Prediction

I'll keep out from voting so it's 3 people and there is definetly gonna be a winner! :) You can decide

sacdallago commented 8 years ago

@ripitrust @mbarbera @JanEr93 :sleeping: :joy:

ripitrust commented 8 years ago

@sacdallago Sorry for the late response, I was having a bad fever these days

My preferences of these two:

Yelp restaurant
Restaurant revenue

But I guess the restaurant revenue will be easy to implement.

mbarbera commented 8 years ago

Yelp Restaurant Photo Classification

JanEr93 commented 8 years ago

I definitely prefer Restaurant revenue. I think that it would be easier to implement (at least for me since I have never worked with picture data before) and the task seems doable with some regression, data clearance, outlier analysis, ....

sacdallago commented 8 years ago

@janer93 unfortunately it seems like the yelp story won from @ripitrust and @mbarbera 's comments.

I kept out of favouring so we have a clear winner.

Image data is not very different from any other type of data. You just have to upsample or downsaple the images to have equally long vectors which contain the color depth at each pixel (after all, images are just matrices and a computer doesn't really care about dimensions, so feeding a flattened, vectorized matrix instead of a 1024x1024 matrix will be just fine :) )

Also: I have not had a look on the data here, but I guess a big improvement for the model would be adding new data items (aka: take pictures from flickr and label them into a new training dataset), adding geolocation and look for some prior knowledge (people in Germany like Italian food, so likelihood that food is italian is high, etc)

sacdallago / dataminer

Project selection #1