Closed alvinwmtan closed 8 months ago
@alvinwmtan I think I've loaded the data. Could you check and confirm. I am away next week and although I'll have my laptop with me, I'm trying not to work next week.
If you can check today and it hasn't worked I'll take another look this evening or on Saturday
@HenryMehta I still don't see the data? The dataset appears in the datasets
table, but not in administration_data
. Also, I double-checked and the 13mo Thal data for English (American) WG is also missing.
There also appear to be missing data from a number of other datasets—in this case, there is partial missing information. Attached is a CSV from Hiromichi Hagihara detailing the missing data_id
values (from a variety of Marchman and Smith datasets):
missinIDs_ageM30.csv
@alvinwmtan I am looking at this. I am looking at Thal 13. It seems to me that the administration is there, however, the link to the Child data is not. I'm looking at that and trying to confirm it. Could you share the sql statement you're using to access the data so I can see how you're trying to link the data. Thanks
@alvinwmtan following the above comment, I've made a change to the upload program and loaded Thal 13 in dev. Does this look better?
@alvinwmtan I have now redone the Thal 16 update. Could you confirm if these have worked before I progress. Also, could you let me know which datasets specifically from Marchman and Smith need reviewing because they take good hour plus to load
@HenryMehta I can see the Thal 13 dataset now in dev, but not the Thal 16.
The datasets with missing IDs are:
All of these have some of the data but are missing a bunch of administrations, especially those from 30-month-old children.
@alvinwmtan Can you share the sql you're using to see the data because I loaded both Thal 13 and 16 the same way
@HenryMehta here's the SQL query:
SELECT common_administration.id AS administration_id, data_id, date_of_test, age, comprehension, production, is_norming, child_id, dataset_id, age_min, age_max
FROM common_administration
LEFT JOIN common_instrument
ON common_administration.instrument_id = common_instrument.id
LEFT JOIN common_child
ON common_administration.child_id = common_child.id
WHERE instrument_id IN (8)
For the Thal 13 in WG, the last line instead reads WHERE instrument_id IN (7)
I suspect that the issue has to do with importing participants that are at the boundaries of the instrument age range: the range for English (American) WS is (16, 30), and so it might be that participants whose ages are around 16 or 30mo might somehow be excluded from import or not correctly retrieved. Not sure if that is diagnosable on your end.
@alvinwmtan SQL looks right. I'm concentrating on Thal 16 for now. I do not understand why it would work for Thal_13 but not 16. I'm trying to load Thal 16 again now. But as I type this I am wondering if it is something to do with the joins, specifically around the child. If I have time I'll look in more detail once I have it loaded. I want to try the join without the child link
@alvinwmtan I think the issue is we seem to have the datasets (not necessarily the administrations) loaded multiple times.
In production we have Thal WS dataset with id 4 and 653 administrations. We also have it with id 128 and 0 administrations. We also have Thal WG with dataset id 7 and 645 administrations and dataset id 129 and 0 administrations.
I think the administrations are loaded but we need to look at the datasets which seems to have gone wrong.
Could you take a look and tell me if you agree with me.
I think Thal WG dataset_id 129 has 641 administrations actually (which is why I could see the 13mos).
I think it's correct that there are four datasets labelled Thal: two WS datasets with source (16, 28) and two WG datasets with source (13, 16). I wonder if the issue arises when we have the same dataset_name repeated? Perhaps we should make the dataset_name unique and just use the dataset_origin_name when doing child_id matching.
@alvinwmtan Yes, of course there are multiple datasets. I forgot how it works. I think the problem might have been I thought there was just 1 WG and 1 WS and I was loading the one based on the file. But there are 2 of each. So I have reloaded all 4. Please let me know if this has worked. It looks like it has to me
@HenryMehta Great, I can see all of them now. So the Thal datasets are resolved, and the ones that remain are the other ones:
The datasets with missing IDs are:
- Marchman (Norming)
- Marchman (Wisconsin)
- Marchman (Dallas)
- Smith (electronic)
- Smith (paper)
All of these have some of the data but are missing a bunch of administrations, especially those from 30-month-old children.
@alvinwmtan I found an error in the load for Norming. I corrected that and then reloaded the 5 (in dev). Please confirm if this worked. If so, I'll load Thal and these 5 to prod.
@HenryMehta I think it has worked—I can see them all just fine. Thank you!
@alvinwmtan I've now applied all these to production. Please confirm ok and I'll close the issue and more onto Wordbank2.1
@HenryMehta looks good on my end, thanks!
The Thal 16mo dataset is not imported in English (American) WS (missing when pulled from wordbankr).