jaswellnitz / tourist-chatbot

TFG - Development of a chatbot for tourist recommendations
6 stars 0 forks source link

Utility Matrix for collaborative filtering #37

Closed jaswellnitz closed 7 years ago

jaswellnitz commented 7 years ago
jaswellnitz commented 7 years ago

During research, it became clear that datasets of tourist ratings are very rare. However, two useful sources were found providing tourist datasets. The first source is tourpedia (http://tour-pedia.org/about/datasets.html) which provides datasets of attractions, restaurants and hotels of eight big cities. This dataset provides a lot of useful information about the points of interests as well as coordinates which is useful by matching the points with the OSM data. Unfortunately the user ratings are very sparse so creating a utility matrix based on this data won’t be very prosperous. On the other hand, a dataset (https://www.researchgate.net/publication/290294341_TripAdvisor_dataset_2015) was found using data from TripAdvisor. The main focus lies here on the user, its interests and ratings by other users. Yet, there is a dataset of user reviews included showing 32.000 ratings of places. These ratings are concentrated mainly on big cities due to their tourist attraction. In order to find out which area is best to use for the chatbot prototype, the data was analyzed in a Python Notebook using the data analysis tool Pandas. It became clear that the Barcelona, being a region of big touristic importance, has the most reviews in Spain. On this basis, we will build our utility matrix, having ratings from 40 different users on nearly 70 points of interests. In order to do use this data in our recommender, the point of interest names of the TripAdvisor dataset have to be matched with the ids of Open Street Map.