Presentation - Githubissues

Developing presentation structure with more details: [@giuliocorradini]

[x] AirBnB flaws / our needs: AirBnB doesn't provide textual search or filtering by reviews.
[x] Dataset: Inside AirBnb; number of listings, number of reviews, ...
[ ] Benchmarking: tailor-made benchmarking dataset with lower number of listings and more control, why precision and recall and not DCG - reusable benchmark dataset over different search engines
[ ] EDA forward pointer: why w2v is less suitable than S.A.
[ ] Assumptions + TUI/user interaction: short queries and not very expressive; the user has to use an additional field as a proxy for expressing emotions
[ ] Schema: Whoosh inverted index and its schema (stored, scored, keyword/textual, ...); sentiment forward index
[ ] Query lang: natural language + keyword-based filtering (room type) + sentiment keywords

[@mc-cat-tty]

System architecture: Whoosh, customization, forward index, packages, ...
Models overview: monotonic improvement curve
Basic model: Whoosh sucks; UnionIR work around
Increasing recall with global query expansion: BERT vs WordNet based; benchmarks feedback proves it
Sentiment ranking: host redemption model (exp decay), but few reviews
Sentiment ranking with number of reviews weighting

[@ent0n29]

Exploratory Dataset Analysis: common context (no w2v) and flat sentiment in listings; unbalanced towards host's profit. Colorful sentiment in reviews (enabling us to do effective S.A.), unbalanced towards positive reviews (hence the decision of using fine-grained classes)
Sentiment analysis internals: previous researches on binary classification, model (BERT multi-class classifier), training dataset forward pointer, examples
GoEmotions - sentiment dataset: 27 classes, fine-grained emotions, etc. (see GoEmotions paper)
Benchmarks results and considerations
Future improvements: w2v clustering tree, ...
Demo

mc-cat-tty / PlaceRank