ohdsi-studies / PCE

Network study validating the Pooled Cohort Equation Model
3 stars 3 forks source link

LOG(0) causing issues on BQ #2

Closed jdposada closed 3 years ago

jdposada commented 3 years ago

hi @jreps ,

Below the full output of the errorReportSql.txt. Below that output, I copied the main console output to give more information on how far along I was in the execution. All help much appreciated. Formatted SQL causing the issue is below.

SELECT
  c.row_id AS row_id,
  measurement_concept_id,
  unit_concept_id,
  LOG(value_as_number) AS value_as_number,
  measurement_date,
  ABS(DATE_DIFF(
    IF
      (SAFE_CAST(c.cohort_start_date AS DATE) IS NULL,
        PARSE_DATE('%Y%m%d',
          CAST(c.cohort_start_date AS STRING)),
        SAFE_CAST(c.cohort_start_date AS DATE)),
    IF
      (SAFE_CAST(measurement_date AS DATE) IS NULL,
        PARSE_DATE('%Y%m%d',
          CAST(measurement_date AS STRING)),
        SAFE_CAST(measurement_date AS DATE)), DAY)) AS index_time,
  value_as_number raw_value,
  EXTRACT(YEAR
  FROM
    c.cohort_start_date)-p.year_of_birth AS age
FROM
  starr_omop_cdm5_deid_20201003.measurement m
INNER JOIN
  jdposada_explore.k8ldeq0icohort_person c
ON
  c.subject_id = m.person_id
  AND measurement_date >= DATE_ADD(
  IF
    (SAFE_CAST(cohort_start_date AS DATE) IS NULL,
      PARSE_DATE('%Y%m%d',
        CAST(cohort_start_date AS STRING)),
      SAFE_CAST(cohort_start_date AS DATE)), INTERVAL -1825 DAY)
  AND measurement_date <= DATE_ADD(
  IF
    (SAFE_CAST(cohort_start_date AS DATE) IS NULL,
      PARSE_DATE('%Y%m%d',
        CAST(cohort_start_date AS STRING)),
      SAFE_CAST(cohort_start_date AS DATE)), INTERVAL 0 DAY)
INNER JOIN
  starr_omop_cdm5_deid_20201003.person p
ON
  p.person_id=c.subject_id
WHERE
  m.measurement_concept_id IN (2212267,
    3015232,
    3019900,
    3027114,
    4008265,
    4190897,
    4198448,
    4260765,
    37393449,
    37397989,
    40484105,
    44791053,
    44809580)

Thanks,

Jose

DBMS: bigquery

Error: java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "location" : "q", "locationType" : "parameter", "message" : "Floating point error in function: LN(0)", "reason" : "invalidQuery" } ], "message" : "Floating point error in function: LN(0)", "status" : "INVALID_ARGUMENT" }

SQL: select c.row_id as row_id, measurement_concept_id, unit_concept_id, log(value_as_number) as value_as_number, measurement_date, abs(DATE_DIFF(IF(SAFE_CAST(c.cohort_start_date AS DATE) IS NULL,PARSE_DATE('%Y%m%d', cast(c.cohort_start_date AS STRING)),SAFE_CAST(c.cohort_start_date AS DATE)), IF(SAFE_CAST(measurement_date AS DATE) IS NULL,PARSE_DATE('%Y%m%d', cast(measurement_date AS STRING)),SAFE_CAST(measurement_date AS DATE)), DAY)) as index_time,value_as_number raw_value, EXTRACT(YEAR from c.cohort_start_date)-p.year_of_birth as age from starr_omop_cdm5_deid_20201003.measurement m inner join jdposada_explore.k8ldeq0icohort_person c on c.subject_id = m.person_id and measurement_date >= DATE_ADD(IF(SAFE_CAST(cohort_start_date AS DATE) IS NULL,PARSE_DATE('%Y%m%d', cast(cohort_start_date AS STRING)),SAFE_CAST(cohort_start_date AS DATE)), interval -1825 DAY) and measurement_date <= DATE_ADD(IF(SAFE_CAST(cohort_start_date AS DATE) IS NULL,PARSE_DATE('%Y%m%d', cast(cohort_start_date AS STRING)),SAFE_CAST(cohort_start_date AS DATE)), interval 0 DAY) inner join starr_omop_cdm5_deid_20201003.person p on p.person_id=c.subject_id where m.measurement_concept_id in (2212267,3015232,3019900,3027114,4008265,4190897,4198448,4260765,37393449,37397989,40484105,44791053,44809580)

R version: R version 4.0.2 (2020-06-22)

Platform: x86_64-pc-linux-gnu

Attached base packages:

Other attached packages:


Terminal Output

Creating cohorts Connecting using BigQuery driver Creating cohort: Black Male Persons who are statin-risk eligible |======================================================================| 100% Executing SQL took 15.8 mins Creating cohort: Black Female Persons who are statin-risk eligible |======================================================================| 100% Executing SQL took 16.3 mins Creating cohort: Non-Black Male Persons who are statin-risk eligible |======================================================================| 100% Executing SQL took 15.4 mins Creating cohort: Non-Black Female Persons who are statin-risk eligible |======================================================================| 100% Executing SQL took 17 mins Creating cohort: Black Male Persons who are statin-risk eligible not censored at statin initiation |======================================================================| 100% Executing SQL took 15.1 mins Creating cohort: Black Female Persons who are statin-risk eligible not censored at statin initiation |======================================================================| 100% Executing SQL took 15.3 mins Creating cohort: Non-Black Male Persons who are statin-risk eligible not censored at statin initiation |======================================================================| 100% Executing SQL took 15.5 mins Creating cohort: Non-Black Female Persons who are statin-risk eligible not censored at statin initiation |======================================================================| 100% Executing SQL took 16.1 mins Creating cohort: first of AMI or ischemic stroke or death |======================================================================| 100% Executing SQL took 1.52 mins Creating cohort: first of AMI or ischemic stroke or death, no IP required |======================================================================| 100% Executing SQL took 1.46 mins Creating cohort: diabetes |======================================================================| 100% Executing SQL took 1.89 mins Creating cohort: smoking |======================================================================| 100% Executing SQL took 1.43 mins Creating cohort: treated_Systolic_BP_mm_Hg |======================================================================| 100% Executing SQL took 5.65 mins Creating cohort: untreated_Systolic_BP_mm_Hg |======================================================================| 100% Executing SQL took 5.98 mins Counting cohorts Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 9.16 secs Fetching cohorts from server Loading cohorts took 7.85 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 13.7 secs Fetching cohorts from server Loading cohorts took 8.97 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 14.9 secs Fetching cohorts from server Loading cohorts took 15.4 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 12.6 secs Fetching cohorts from server Loading cohorts took 17.5 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 12.2 secs Fetching cohorts from server Loading cohorts took 7.94 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 10.7 secs Fetching cohorts from server Loading cohorts took 6.67 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 11.5 secs Fetching cohorts from server Loading cohorts took 13.8 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 13 secs Fetching cohorts from server Loading cohorts took 16.7 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 12.3 secs Fetching cohorts from server Loading cohorts took 7.88 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 10.3 secs Fetching cohorts from server Loading cohorts took 6.75 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 12.2 secs Fetching cohorts from server Loading cohorts took 14.5 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 11.6 secs Fetching cohorts from server Loading cohorts took 14.2 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 11.4 secs Fetching cohorts from server Loading cohorts took 6.2 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 10.4 secs Fetching cohorts from server Loading cohorts took 7.41 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 12.3 secs Fetching cohorts from server Loading cohorts took 14.4 secs Extracting data Connecting using BigQuery driver

Constructing the at risk cohort |======================================================================| 100% Executing SQL took 15.8 secs Fetching cohorts from server Loading cohorts took 33 secs Packaging results

Study results are compressed and ready for sharing at: /workdir/workdir/notebooks/ohdsi2020_symposium/starr-omop.zip

jreps commented 3 years ago

You have some measurements with 0 as the value. I'll need to add into the where that we want non-zero values when using log. Let me quickly try that edit now.

jreps commented 3 years ago

I've added '{@lnValue}?{ and value_as_number >0 }' to the end of the measurement SQL. My R is being blocked by a slow cohort so I've not tested it - but hopefully this will fix the issue.

jdposada commented 3 years ago

It did thanks