langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Missing "Spanish (Mexican)" WS administrations? #289

Closed kachergis closed 1 year ago

kachergis commented 1 year ago

I get 2045 unique data_ids from get_instrument_data(), but only 1682 administrations from get_administration_data(). Code to reproduce:

require(tidyverse)
require(wordbankr)

language = "Spanish (Mexican)"
form = "WS"
d_demo <-  get_administration_data(language = language, form = form)
items <- get_item_data(language = language, form=form) %>%
  filter(item_kind=="word") 
d_long <- get_instrument_data(language = language, form = form) %>% 
  left_join(items %>% select(-complexity_category), by="item_id") %>%
  filter(item_kind=="word") 

length(unique(d_long$data_id))
# 2045
length(unique(d_demo$data_id))
# 1682 
alvinwmtan commented 1 year ago

This behaviour is due to get_administration_data automatically applying filter_age = TRUE, whereas get_instrument_data does not. We can retrieve all 2045 admins using:

d_demo <- get_administration_data(language = language, form = form, filter_age = FALSE)