langcog / childes-db

A SQL interface for the CHILDES child language corpora
13 stars 5 forks source link

weird glosses included that aren't in token list in get_tokens() #53

Open ebergelson opened 4 years ago

ebergelson commented 4 years ago

simplest eg: dogtest <- get_tokens(token = "dog") unique(dogtest$gloss) output: [1] "Dog" "dog" "laughing" "dog's"

ebergelson commented 4 years ago

this appears to be bc of the default replace = T...still feels like a bug but maybe is error in database?

dogtest <- get_tokens(token = "dog", replace = F)
unique(dogtest$gloss)

output: [1] "Dog" "dog"

smeylan commented 4 years ago

Almost certainly a problem with the way we parse replacements (which is the most complicated thing in parsing CHILDES, esp. the interaction of replacement tokens with the annotations for disfluencies and reformulations). We'll look into it.