roux-ohdsi / allofus

R package to streamline use of the AllofUs researcher workbench
https://roux-ohdsi.github.io/allofus/
Other
12 stars 2 forks source link

aou_survey returns concept codes for SDOH survey instead of readable responses #24

Open rbcavanaugh opened 2 months ago

rbcavanaugh commented 2 months ago
aou_survey(questions = 40192519) %>% count(sdoh_eds_8)
# Source:   SQL [9 x 2]
# Database: BigQueryConnection
  sdoh_eds_8            n
  <chr>           <int64>
1 SDOH_28           23093
2 SDOH_27           10317
3 NA               293487
4 SDOH_20           75071
5 SDOH_24            1120
6 Skip               2888
7 SDOH_26            2868
8 SDOH_25            1516
9 SDOH_20,SDOH_27       (<20)
<SQL>
SELECT
  COALESCE(`person`.`person_id`, `RHS`.`person_id`) AS `person_id`,
  `sdoh_eds_8`,
  `sdoh_eds_8_date`
FROM `person`
FULL JOIN (
  SELECT
    `person_id`,
    MAX(IF(`observation_source_value` = 'sdoh_eds_8', `value_source_value`, NULL)) AS `sdoh_eds_8`,
    MAX(IF(`observation_source_value` = 'sdoh_eds_8', `observation_date`, NULL)) AS `sdoh_eds_8_date`
  FROM (
    SELECT
      `person_id`,
      `observation_source_value`,
      STRING_AGG(value_source_value order by value_source_value) AS `value_source_value`,
      `observation_date`
    FROM (
      SELECT
        `observation_id`,
        `person_id`,
        `observation_concept_id`,
        `observation_date`,
        `observation_datetime`,
        `observation_type_concept_id`,
        `value_as_number`,
        `value_as_string`,
        `value_as_concept_id`,
        `qualifier_concept_id`,
        `unit_concept_id`,
        `provider_id`,
        `visit_occurrence_id`,
        `visit_detail_id`,
        `observation_source_value`,
        `observation_source_concept_id`,
        `unit_source_value`,
        `qualifier_source_value`,
        `value_source_concept_id`,
        COALESCE(`value_source_value`, CAST(value_as_number AS STRING)) AS `value_source_value`,
        `questionnaire_response_id`
      FROM (
        SELECT
          `observation_id`,
          `person_id`,
          `observation_concept_id`,
          `observation_date`,
          `observation_datetime`,
          `observation_type_concept_id`,
          `value_as_number`,
          `value_as_string`,
          `value_as_concept_id`,
          `qualifier_concept_id`,
          `unit_concept_id`,
          `provider_id`,
          `visit_occurrence_id`,
          `visit_detail_id`,
          `observation_source_value`,
          `observation_source_concept_id`,
          `unit_source_value`,
          `qualifier_source_value`,
          `value_source_concept_id`,
          CASE
WHEN (CONTAINS_SUBSTR(`value_source_value`, 'cope_')) THEN `value_source_value`
WHEN (CONTAINS_SUBSTR(`value_source_value`, 'SDOH_')) THEN `value_source_value`
WHEN (NOT(CONTAINS_SUBSTR(`value_source_value`, '_'))) THEN `value_source_value`
ELSE (REGEXP_EXTRACT(`value_source_value`, '.+_(.+_*.*)'))
END AS `value_source_value`,
          `questionnaire_response_id`
        FROM `terra-vpc-sc-34a20a6f._124406935f0327a24c77b3aa6985cb0a14413368.anonafd8d8058763b21bbeb53379ac7284740b4d2a527f4501ed6bf22b0d6923712f`
        WHERE (1.0 > 0.0)
      ) `q01`
    ) `q01`
    GROUP BY `person_id`, `observation_date`, `observation_source_value`
  ) `q01`
  GROUP BY `person_id`
) `RHS`
  ON (`person`.`person_id` = `RHS`.`person_id`)
rbcavanaugh commented 2 months ago

It looks like this is because 'concept_code' is done a little differently for the SDOH (and maybe cope) surveys. In regular surveys (e.g., overall health), the concept code for an answer would be something like GeneralMentalHealth_Excellent and we strip away the text before the underscore to get the answer.

However, we don't do this for SDOH and COPE, presumably because the text after the underscore (its just a 2-digit number) is meaningless.

if (isTRUE(clean_answers)) {
      tmp <- dplyr::mutate(tmp,
        value_source_value = dplyr::case_when(
          CONTAINS_SUBSTR(.data$value_source_value, "cope_") ~ value_source_value,
          CONTAINS_SUBSTR(.data$value_source_value, "SDOH_") ~ value_source_value,
          !CONTAINS_SUBSTR(.data$value_source_value, "_") ~ value_source_value,
          TRUE ~ REGEXP_EXTRACT(.data$value_source_value, ".+_(.+_*.*)")
        )
      )
    }

So to fix this, we need to find a different way to map the concept_code (SDOH_20, for example) to the human readable response. It looks like aou_codebook has a way of pulling this information from the PPI vocab. looks like its the title of the answer concept or a standard to non-standard map?

louisahsmith commented 2 months ago

Very crude attempt to fix: c186d3af63758136b1d438197b78ba22030b5ba5

rbcavanaugh commented 1 day ago

Finally returning to this. @louisahsmith this returns an error:

> aou_survey(questions = 40192519)
Job complete
Billed: 0 B
Job complete
Billed: 0 B
Running job terra-vpc-sc-34a20a6f.job_Sn5Od9ImVUa_bFhwoLOYpO8T4at4.US ⠸ 8s  
Job complete
Billed: 20.75 GB
Job complete
Billed: 0 B
Error in `bigrquery::bq_project_query()`:
! Job terra-vpc-sc-34a20a6f.job_RqSefm75pvhRSmg2l4avrpeugusX.US failed
✖ Syntax error: Expected ")" or "," but got identifier "t" at [100:29] [invalidQuery]
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
No cohort provided.
→ Pulling survey data for entire All of Us cohort.