shreyashankar / streams

STREAMS: A Benchmark of Naturalistic Streaming Data for Online Continual Learning
2 stars 0 forks source link

CoAuthor #1

Closed shreyashankar closed 2 years ago

shreyashankar commented 2 years ago

CoAuthor https://coauthor.stanford.edu/ This dataset contains cursor-level interactions between human writers and GPT3 in an agent-assisted writing session. There is no given task, so we propose a new one: given the text written thus far and the start of a text insertion/deletion, predict the rest of the text-insert/delete block. This has pragmatic value – if GPT3 can better predict how humans will fill in the gaps, the user will use its suggestions more often, increasing productivity. X: (text written thus far, first inserted/deleted character, whether character was inserted or deleted) Y: rest of the inserted/deleted text Domains: Author ID (each author had multiple sessions, and each session has multiple text insertion/deletions) Prompt ID Domain Shifts: Covariate Shift: p(edits|author) doesn’t change but p(author does) Concept Shift: As an author becomes more familiar with interacting with GPT3, p(edits|author) may change as they anticipate agent behavior