Extract linguistic features from hotel reviews dataset.

You are free to select any of the tools and techniques you have learnt in the classroom in order to transform the unstructured reviews into structured data. Just make sure that the new features are aligned with current domain knowledge and intuition. For example, you can perform sentiment analysis on English customer reviews, because the intuition is that crime impacts negatively on the overall sentiment. Thus, hotels located near crime hotspots are more likely than random chance to receive negate reviews. Try to extract at least three new features.

Features:

[x] Sentiment on Review Title from stanfordNLP (title_sentiment.xlsx on Gdrive, indexed by the concat of those 3 files) I find that extracting sentiment from the Review Content is a bit more troublesome and doesnt provide accurate result on my pilot test
[x] Effective Star Rating, Compute Effective Star Rating, to enhance the current Star Ratings with score from Sentiment Analysis. Due to the fact that people have their own weighting justification. The 3 of some people might mean Neutral, while some mean bad.
[x] Distance from Crime scene. (The closer, more crime)

kangaroooh / Crime-Prediction

Extract linguistic features from hotel reviews dataset. #3