shreyashankar / streams

STREAMS: A Benchmark of Naturalistic Streaming Data for Online Continual Learning
2 stars 0 forks source link

Jeopardy #5

Closed shreyashankar closed 2 years ago

shreyashankar commented 2 years ago

Jeopardy https://www.kaggle.com/tunguz/200000-jeopardy-questions Interesting dataset because real Jeopardy contestants have used domain-switching as a strategy to outmaneuver other players. Does represent something of a natural (although admittedly niche) distribution of trivia questions, since it was not specifically created for an NLP task. Data needs to be cleaned up a little though (remove HTML tags around some questions) X: question (string) Y: answer (string) Domains: Category Value ($200, $400, …, $2000) Domain Shifts: Covariate Shift: P(answer|question) doesn’t change but P(question) does Higher value questions tend to be more difficult Different categories have different questions Label Shift: P(question|answer) doesn’t change but P(answer) does N/A here (since p(q|a) changes as well) Concept Shift: Many of the questions have factual answers that change over time (e.g., who is the US president)