Closed golozara closed 3 years ago
I hope that these issues are fixed with https://github.com/ohdsi-studies/PioneerWatchfulWaiting/commit/862e1cd76d642ff8dd45e16b90c76068ab83c634 but we will have to test that
Issue 1 is not blocking but we can have a look, may also be related to the unique bug #70
Issue 2: @rfherrerac could you check the first SQL statement in inst\sql\sql_server\quartiles\IQRComplementaryTables.sql
using https://data.ohdsi.org/SqlDeveloper to see what is going on?
When I test the generation of age statistics for issue 2 in a Postgres Synthea database, it works:
SELECT tab.cohort_definition_id,
tab.person_id,
tab.cohort_start_date,
(EXTRACT(YEAR FROM CAST(tab.cohort_start_date AS DATE)) - EXTRACT(YEAR FROM CAST(TO_DATE(TO_CHAR(tab.year_of_birth,'0000')||'-'||TO_CHAR(tab.month_of_birth,'00')||'-'||TO_CHAR(tab.day_of_birth,'00'), 'YYYY-MM-DD') AS DATE))) AS age
FROM (
SELECT cohort_definition_id, person_id, cohort_start_date, year_of_birth, month_of_birth, day_of_birth
FROM pioneer.pioneer_s1k_stg c
JOIN cdm_synthea1k.person p
ON p.person_id = c.subject_id
WHERE c.cohort_definition_id IN (202)
) tab
;
generates a table with calculated age at cohort start.
I tried to test this as best as possible, by tricking runStudy into viewing my outcome cohorts as target cohorts (since I don't have counts in any target cohort from my synthetic database). Age at diagnosis does show up in the metrics distribution results:
> metricsDistribution
cohortDefinitionId iqr minimum q1 median q3 maximum
1 202 33 0.0 34.0 53.0 67.0 99.0
...
analysisName database_id
1 Age At Diagnosis S1K
My best guess is that the specific Oracle SQL generated has some bug that leads to the resulting subject_age
table being empty, need to debug this directly with the database owners.
I figured we have no info MONTH_OF_BIRTH DAY_OF_BIRTH, all is NA. @keesvanbochove your code did not work for that reason. In my system, this script worked, but I am a pretty bad programmer in SQL ;)
"DROP TABLE IF EXISTS @cohort_database_schema.subject_age; CREATE TABLE @cohort_database_schema.subject_age AS SELECT tab.cohort_definition_id, tab.person_id, tab.cohort_start_date, EXTRACT(year FROM CAST(tab.cohort_start_date AS date)) - tab.year_of_birth as age FROM ( SELECT cohort_definition_id, person_id, cohort_start_date, year_of_birth, month_of_birth, day_of_birth FROM @cohort_database_schema.@cohort_table c JOIN @cdm_database_schema.person p ON p.person_id = c.subject_id WHERE c.cohort_definition_id IN (@target_ids) ) tab ; "
@rfherrerac thanks, that makes sense, when I try it in Postgres with SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth, NULL as month_of_birth, NULL as day_of_birth
then the outer query throws an error: function to_char(text, unknown) does not exist
. But if I use 0 as month_of_birth, 0 as day_of_birth
then it works fine.
If most databases don't have this information at all, we could simplify the query to (OHDSI SQL):
-- create subject age table
DROP TABLE IF EXISTS @cohort_database_schema.subject_age;
CREATE TABLE @cohort_database_schema.subject_age AS
SELECT tab.cohort_definition_id,
tab.person_id,
tab.cohort_start_date,
DATEDIFF(year, DATEFROMPARTS(tab.year_of_birth,1,1),
tab.cohort_start_date) AS age
FROM (
SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth
FROM @cohort_database_schema.@cohort_table c
JOIN @cdm_database_schema.person p
ON p.person_id = c.subject_id
WHERE c.cohort_definition_id IN (@target_ids)
) tab
;
So what we could do is catch both and use the following "OHDSI SQL":
-- create subject age table
DROP TABLE IF EXISTS @cohort_database_schema.subject_age;
CREATE TABLE @cohort_database_schema.subject_age AS
SELECT tab.cohort_definition_id,
tab.person_id,
tab.cohort_start_date,
DATEDIFF(year, DATEFROMPARTS(tab.year_of_birth,tab.month_of_birth, tab.day_of_birth),
tab.cohort_start_date) AS age
FROM (
SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth,
CASE WHEN ISNUMERIC(p.month_of_birth) = 1 THEN p.month_of_birth ELSE 1 END AS month_of_birth,
CASE WHEN ISNUMERIC(p.day_of_birth) = 1 THEN p.day_of_birth ELSE 1 END AS day_of_birth
FROM @cohort_database_schema.@cohort_table c
JOIN @cdm_database_schema.person p
ON p.person_id = c.subject_id
WHERE c.cohort_definition_id IN (@target_ids)
) tab
;
@rfherrerac could you try this in your database? In Oracle the SELECT part would become something like this through SqlRender:
SELECT tab.cohort_definition_id,
tab.person_id,
tab.cohort_start_date,
(EXTRACT(YEAR FROM CAST(tab.cohort_start_date AS DATE)) - EXTRACT(YEAR FROM CAST(TO_DATE(TO_CHAR(tab.year_of_birth,'0000')||'-'||TO_CHAR(tab.month_of_birth,'00')||'-'||TO_CHAR(tab.day_of_birth,'00'), 'YYYY-MM-DD') AS DATE))) AS age
FROM (SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth, CASE WHEN CASE WHEN (LENGTH(TRIM(TRANSLATE(p.month_of_birth, ' +-.0123456789',' '))) IS NULL) THEN 1 ELSE 0 END = 1 THEN p.month_of_birth ELSE 1 END AS month_of_birth, CASE WHEN CASE WHEN (LENGTH(TRIM(TRANSLATE(p.day_of_birth, ' +-.0123456789',' '))) IS NULL) THEN 1 ELSE 0 END = 1 THEN p.day_of_birth ELSE 1 END AS day_of_birth
FROM pioneer.pioneer_s1k_stg c
JOIN cdm_synthea1k.person p
ON p.person_id = c.subject_id
WHERE c.cohort_definition_id IN (202)
) tab
;
@keesvanbochove Yes, it worked.
few observations so far based on the results uploaded in the ShinyApp that needs to be fixed: 1) we have a category/strata showing as: gender= . Needs to be taken out 2) age at diagnosis statistics is missing in the results.