langcog / childesr

R interface to childes-db
14 stars 6 forks source link

get_utterances computes age range incorrectly? #59

Open anne-apel opened 1 year ago

anne-apel commented 1 year ago

I am trying to get the utterances from the target child Ross in the MacWhinney corpus (collection Eng-NA) in the age range from 3-4 years, corresponding to the files 30001a (age 3:00,01) to 41125d (age 4;11,25). In the CHILDES database (browsable files for MacWhinney: https://sla.talkbank.org/TBB/childes/Eng-NA/MacWhinney), the files are named after Ross's age at the time of production, so I can compare what I should get from get_utterances with what I actually get.

Here's what I am doing: utt_ross <- get_utterances(corpus = "MacWhinney", role = c("Target_child"), target_child = "Ross", age = c(36, 59)) This yields a tibble with 3603 obs. of 27 variables.

In order to check if the command got everything it should, I am looking for the utterances in the first and last files that I wanted to get. According to this method, (at least) the following utterances are lacking from the results:

I am not a 100% sure if I am checking the results correctly (there probably are better methods), but even considering that age is converted into days and back into months in the get_contents function, something seems to be going wrong when utterances from 7 months are missing in my end results. The max age that I get (computed by the function) is 52.13249 (instead of something like 59.8 which I should be getting for age 4;11,25).

Sorry in advance if this should be an error on my side!