ohdsi-studies / PioneerWatchfulWaiting

This study is part of the joint PIONEER - EHDEN - OHDSI studyathon in March 2021, and aims to advance understanding of clinical management and outcomes of watchful waiting in prostate cancer.
Apache License 2.0
7 stars 18 forks source link

issues observed in the new Results #68

Closed golozara closed 3 years ago

golozara commented 3 years ago

few observations so far based on the results uploaded in the ShinyApp that needs to be fixed: 1) we have a category/strata showing as: gender= . Needs to be taken out 2) age at diagnosis statistics is missing in the results.

keesvanbochove commented 3 years ago

I hope that these issues are fixed with https://github.com/ohdsi-studies/PioneerWatchfulWaiting/commit/862e1cd76d642ff8dd45e16b90c76068ab83c634 but we will have to test that

keesvanbochove commented 3 years ago

Issue 1 is not blocking but we can have a look, may also be related to the unique bug #70 Issue 2: @rfherrerac could you check the first SQL statement in inst\sql\sql_server\quartiles\IQRComplementaryTables.sql using https://data.ohdsi.org/SqlDeveloper to see what is going on?

keesvanbochove commented 3 years ago

When I test the generation of age statistics for issue 2 in a Postgres Synthea database, it works:

SELECT tab.cohort_definition_id,
       tab.person_id,
       tab.cohort_start_date,
       (EXTRACT(YEAR FROM CAST(tab.cohort_start_date AS DATE)) - EXTRACT(YEAR FROM CAST(TO_DATE(TO_CHAR(tab.year_of_birth,'0000')||'-'||TO_CHAR(tab.month_of_birth,'00')||'-'||TO_CHAR(tab.day_of_birth,'00'), 'YYYY-MM-DD') AS DATE))) AS age
FROM (
     SELECT cohort_definition_id, person_id, cohort_start_date, year_of_birth, month_of_birth, day_of_birth
     FROM pioneer.pioneer_s1k_stg c
     JOIN cdm_synthea1k.person p
         ON p.person_id = c.subject_id
     WHERE c.cohort_definition_id IN (202)
     ) tab
;

generates a table with calculated age at cohort start.

keesvanbochove commented 3 years ago

I tried to test this as best as possible, by tricking runStudy into viewing my outcome cohorts as target cohorts (since I don't have counts in any target cohort from my synthetic database). Age at diagnosis does show up in the metrics distribution results:

> metricsDistribution
   cohortDefinitionId iqr minimum     q1 median     q3 maximum
1                 202  33     0.0   34.0   53.0   67.0    99.0
...
                      analysisName database_id
1                 Age At Diagnosis         S1K

My best guess is that the specific Oracle SQL generated has some bug that leads to the resulting subject_age table being empty, need to debug this directly with the database owners.

rfherrerac commented 3 years ago

I figured we have no info MONTH_OF_BIRTH DAY_OF_BIRTH, all is NA. @keesvanbochove your code did not work for that reason. In my system, this script worked, but I am a pretty bad programmer in SQL ;)

"DROP TABLE IF EXISTS @cohort_database_schema.subject_age; CREATE TABLE @cohort_database_schema.subject_age AS SELECT tab.cohort_definition_id, tab.person_id, tab.cohort_start_date, EXTRACT(year FROM CAST(tab.cohort_start_date AS date)) - tab.year_of_birth as age FROM ( SELECT cohort_definition_id, person_id, cohort_start_date, year_of_birth, month_of_birth, day_of_birth FROM @cohort_database_schema.@cohort_table c JOIN @cdm_database_schema.person p ON p.person_id = c.subject_id WHERE c.cohort_definition_id IN (@target_ids) ) tab ; "

keesvanbochove commented 3 years ago

@rfherrerac thanks, that makes sense, when I try it in Postgres with SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth, NULL as month_of_birth, NULL as day_of_birth then the outer query throws an error: function to_char(text, unknown) does not exist. But if I use 0 as month_of_birth, 0 as day_of_birth then it works fine.

If most databases don't have this information at all, we could simplify the query to (OHDSI SQL):

-- create subject age table
DROP TABLE IF EXISTS @cohort_database_schema.subject_age;
CREATE TABLE @cohort_database_schema.subject_age AS
SELECT tab.cohort_definition_id,
       tab.person_id,
       tab.cohort_start_date,
       DATEDIFF(year, DATEFROMPARTS(tab.year_of_birth,1,1),
                tab.cohort_start_date) AS age
FROM (
     SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth
     FROM @cohort_database_schema.@cohort_table c
     JOIN @cdm_database_schema.person p
         ON p.person_id = c.subject_id
     WHERE c.cohort_definition_id IN (@target_ids)
     ) tab
;
keesvanbochove commented 3 years ago

So what we could do is catch both and use the following "OHDSI SQL":

-- create subject age table
DROP TABLE IF EXISTS @cohort_database_schema.subject_age;
CREATE TABLE @cohort_database_schema.subject_age AS
SELECT tab.cohort_definition_id,
       tab.person_id,
       tab.cohort_start_date,
       DATEDIFF(year, DATEFROMPARTS(tab.year_of_birth,tab.month_of_birth, tab.day_of_birth),
                tab.cohort_start_date) AS age
FROM (
     SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth,
               CASE WHEN ISNUMERIC(p.month_of_birth) = 1 THEN p.month_of_birth ELSE 1 END AS month_of_birth,
               CASE WHEN ISNUMERIC(p.day_of_birth) = 1 THEN p.day_of_birth ELSE 1 END AS day_of_birth
     FROM @cohort_database_schema.@cohort_table c
     JOIN @cdm_database_schema.person p
         ON p.person_id = c.subject_id
     WHERE c.cohort_definition_id IN (@target_ids)
     ) tab
;

@rfherrerac could you try this in your database? In Oracle the SELECT part would become something like this through SqlRender:

SELECT tab.cohort_definition_id,
       tab.person_id,
       tab.cohort_start_date,
       (EXTRACT(YEAR FROM CAST(tab.cohort_start_date AS DATE)) - EXTRACT(YEAR FROM CAST(TO_DATE(TO_CHAR(tab.year_of_birth,'0000')||'-'||TO_CHAR(tab.month_of_birth,'00')||'-'||TO_CHAR(tab.day_of_birth,'00'), 'YYYY-MM-DD') AS DATE))) AS age
FROM (SELECT c.cohort_definition_id, p.person_id, c.cohort_start_date, p.year_of_birth, CASE WHEN CASE WHEN (LENGTH(TRIM(TRANSLATE(p.month_of_birth, ' +-.0123456789',' '))) IS NULL) THEN 1 ELSE 0 END = 1 THEN p.month_of_birth ELSE 1 END AS month_of_birth, CASE WHEN CASE WHEN (LENGTH(TRIM(TRANSLATE(p.day_of_birth, ' +-.0123456789',' '))) IS NULL) THEN 1 ELSE 0 END = 1 THEN p.day_of_birth ELSE 1 END AS day_of_birth
     FROM pioneer.pioneer_s1k_stg c
     JOIN cdm_synthea1k.person p
         ON p.person_id = c.subject_id
       WHERE c.cohort_definition_id IN (202)
      ) tab
 ;
rfherrerac commented 3 years ago

@keesvanbochove Yes, it worked.