langcog / web-cdi

7 stars 5 forks source link

Historical problem with "total produced" calculation #453

Closed hilary-rose closed 1 year ago

hilary-rose commented 1 year ago

Hi there,

We recently graphed some of our WebCDI data and noticed a few funny plots, where a few longitudinal participants would show large spikes in vocabulary for one month in the middle of their participation, and then the plot would go back to normal (see image below for an example).

Looking closer at the data, it seems like perhaps there were limited times when WebCDI was not calculating the totals correctly, because this issue seems most prevalent for children with CDI data from similar time points. Below is a table of a few administrations where we noticed this issue, including the administration ID, the totals as calculated by WebCDI, the totals as calculated by us (summing words produced), the form language (Words and Sentences American English and Quebec French), and the date modified for the administration.

Just wanted to make sure that you are aware of this issue!

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

administration_id | form | total_produced (from WebCDI) | total_produced (our calculation) | date_modified -- | -- | -- | -- | -- 164583 | EN Words & Sentences | 1492 | 636 | 2022-09-01 19:11 164580 | EN Words & Sentences | 764 | 381 | 2022-09-10 2:21 164584 | EN Words & Sentences | 330 | 238 | 2022-09-11 12:02 165021 | FR Mots & Énoncés | 550 | 441 | 2022-10-07 12:04

image

vmarchman commented 1 year ago

Hi Hilary. yes we were aware of this and thought we had resolved it by changing how we computed the scores. I'm ccing @Henry @.***> to see his thoughts on this.

Just out of curiosity, were these scores downloaded a while ago or more recently? If you download those scores now are they still wacky?

We will get to the bottom of this!

Thanks.

Virginia

On Mon, May 29, 2023 at 3:43 PM Hilary Rose @.***> wrote:

Hi there,

We recently graphed some of our WebCDI data and noticed a few funny plots, where a few longitudinal participants would show large spikes in vocabulary for one month in the middle of their participation, and then the plot would go back to normal (see image below for an example).

Looking closer at the data, it seems like perhaps there were limited times when WebCDI was not calculating the totals correctly, because this issue seems most prevalent for children with CDI data from similar time points. Below is a table of a few administrations where we noticed this issue, including the administration ID, the totals as calculated by WebCDI, the totals as calculated by us (summing words produced), the form language (Words and Sentences American English and Quebec French), and the date modified for the administration.

Just wanted to make sure that you are aware of this issue!

administration_id form total_produced (from WebCDI) total_produced (our calculation) date_modified 164583 EN Words & Sentences 1492 636 2022-09-01 19:11 164580 EN Words & Sentences 764 381 2022-09-10 2:21 164584 EN Words & Sentences 330 238 2022-09-11 12:02 165021 FR Mots & Énoncés 550 441 2022-10-07 12:04

[image: image] https://user-images.githubusercontent.com/52216858/241813964-4b25e801-f423-4202-9920-dbc423b4920a.png

— Reply to this email directly, view it on GitHub https://github.com/langcog/web-cdi/issues/453, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2TUTFBQ2MCDNDRJZGWUZLXIUQ7RANCNFSM6AAAAAAYTHGDZA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--


Virginia A. Marchman, Ph.D. @.*** Research Associate Language Learning Lab Psychology Stanford University Stanford, CA 94305 Direct: 650-725-7493 Lab: 650-723-1257


hilary-rose commented 1 year ago

Thanks Virginia! The plot above was generated with data downloaded in early April. I just re-downloaded data, and re-ran the script for the graph, and it looks quite different—more ups and downs now. I see that the last modified date for most administrations now appears to be in mid-May, so I think the data has changed, but it looks like there are still some totals that aren't calculating correctly (see updated graph for the same participant using the data downloaded just now).

image

Warmly,

Hilary

HenryMehta commented 1 year ago

@vmarchman @hilary-rose The calculate incorrectly when it tries to do the calculation twice at the same time. There are various reasons this happen previously and we've changed the code to try and avoid it. However, it looks like when we did the recalc, we've managed to double calculate some again. I think the only approach is going to be for me to set of the recalcs again, one study at a time - this will take several days I'm afraid

HenryMehta commented 1 year ago

@hilary-rose I would like to start with one study and proves this sorts it. Could you propose one of your studies please Thanks

hilary-rose commented 1 year ago

Sure, we first noticed this issue with the study called "Baby's New Words-EN Words & Sentences", so you could take a look at that one!

HenryMehta commented 1 year ago

@hilary-rose I have run an update for that study. Please confirm if you think it has worked and I will slowly roll out to the remainder

HenryMehta commented 1 year ago

@hilary-rose Can you confirm if this sorted the issue. Thank you

hilary-rose commented 1 year ago

Hi Henry,

Thanks for working on this. It looks like there are still a couple anomalies in the French data, like for administration id 165021 and administration id 160782. The English is looking okay, though.

image

image

HenryMehta commented 1 year ago

@hilary-rose That's as expected as I only ran the update for English. I will not work through the whole database

HenryMehta commented 1 year ago

@hilary-rose I've now completed an update of all studies. Please could you check and confirm. Thanks