sebastianbarfort / sds

Social Data Science, course at University of Copenhagen
http://sebastianbarfort.github.io/sds/
12 stars 17 forks source link

Group 24 Final Project Proposal #62

Closed qdz232 closed 8 years ago

qdz232 commented 8 years ago

title: "Final Project Proposal" author: (Group 24) Sophie Burgard , Linxin Chen, Guillermo Edgardo Sepúlveda Witt, Emily Mae Svensson date: "November 14, 2015"

output: html_document

For the final project we have decided to analyse the famous website for reviews different businesses--target more for restaurant related businesses--named "Yelp". We will first scrape the data from the "Yelp" website by using their open API then will be specifically look at these following questions in more detail. On top of it all, "Yelp" is a website used everywhere around the world, from North America all the way to Europe to Asia, scraping the data from all of these places will take a very long time therefore we will just focus on the sample space of Copenhagen, Denmark.

  1. Reviewers are normal distributed or not?
    • taking a closer look at the frequency of the reviewers and their rating
      • normally distributed
      • left or right skewed
  2. Can a specific rating on a particular restaurant effect reviewers’ review
    • running a regression on the rating of restaurants and the reviewers' review
      • see if there is a relationship between these two
      • if there is positive or negative
  3. Number reviews in Copenhagen --> show it on a map
    • analyses and see if the reviewers are just passing by or actually purposely went to the restaurant
  4. numbers of reviews by reviewers distributed by time
  5. See if there is a relationship between helpfulness of reviews and the rating of the review
    • running a regression on how people think if that review is helpful or not and the rating the reviewer
      • to see if there is a relationship between these two variable
  6. Does people write longer reviewer when they are not satisfy with the service or satisfy?
    • Running a regression to see if there is a relationship between word count and the rating of the review (heavily depends on if we are able to get "word count" data)
  7. Are there more positive reviews on higher pricing restaurants?
    • people tends to think price the restaurant better the service
      • running a regression with price data and reviewers' rating and see if there is a relationship between them