Closed tseto closed 3 years ago
The SQL query in question would be: https://github.com/ohdsi-studies/PioneerWatchfulWaiting/blob/master/inst/sql/sql_server/strata/StratifyByCohort.sql. You can generate the exact SQL by using https://data.ohdsi.org/SqlDeveloper/, pasting the above file in there, selecting BigQuery and filling in the settings you are using).
Another option could be to review the BigQuery query log - I only know how to do this in Postgres but I guess GCP would have similar options (see e.g. https://cloud.google.com/bigquery/docs/reference/auditlogs)
I'm surprised that this is the query it's choking on, because it is not super complex... although it does have a few nested structures. Do you think it would help if we rewrote it to use temp tables rather than using subqueries/nesting?
There seems to be some limitations with BQ. Rewriting queries may help.
Modifying my local copy of StrataConstruction.R to create a record and select unnest will work better in BigQuery than thousands of union statements.
SELECT * FROM UNNEST(ARRAY<STRUCT<target_id INT64, strata_id INT64, cohort_id INT64, cohort_type STRING>> [(105, 338, 105003382, 'TwoS'), (106, 338, 106003382, 'TwoS')...
@tseto could you contribute a PR with this fix? I'd like to test it in other database systems as well.
Records and STRUCT are BigQuery-specific syntax.
sql <- "SELECT * INTO #TARGET_STRATA_XREF FROM UNNEST(ARRAY<STRUCT<target_id INT64, strata_id INT64, cohort_id INT64, cohort_type STRING>> [ @records ]);" records <- ""; for(i in 1:nrow(targetStrataXref)) { stmt <- paste0("(", targetStrataXref$targetId[i], ",", targetStrataXref$strataId[i], ",", targetStrataXref$cohortId[i], ",", "'", targetStrataXref$cohortType[i], "')") if (i < nrow(targetStrataXref)) { records <- paste(records, stmt, ",", sep="\n") } else { records <- paste(records, stmt, sep="\n") } }
Alternatively, the SQL below should work for Postgres, Oracle and others.
sql <- "CREATE TABLE #TARGET_STRATA_XREF AS SELECT FROM JSON_TABLE ( '[ @records_in_json_array ]', '$[]' COLUMNS ( "target_id" path '$.target_id', "strata_id" path '$.strata_id', "cohort_id" path '$.cohort_id', "cohort_type" path '$.cohort_type' ));"
Hello,
I’m trying to run through the PioneerWatchfulWaiting code. The 38 target cohorts were create successfully as well as the 7 feature cohorts.
I’m getting this error during the stratified target cohort step, query is too complex. We’re running against Google Cloud BigQuery. The log doesn’t indicate which SQL file is being run. Can you let me know which file so I can take a look at the query?
I'll also try to run my docker image in the GCP project rather than locally to see if that helps.
---- Creating stratified target cohorts ----
Stratify by Full 365-day follow up
|=======================================================================================| 100% Executing SQL took 9.79 secs Stratify by < 365-day follow up |=======================================================================================| 100% Executing SQL took 9.18 secs Stratify by Black or African American |=======================================================================================| 100% Executing SQL took 11.4 secs Stratify by White |=======================================================================================| 100% Executing SQL took 11.3 secs Stratify by cohorts
| | 0%Error: Error executing SQL: java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. An error report has been created at /home/ohdsi/workdir/notebooks/errorReportSql.txt Run
rlang::last_error()
to see where the error occurred.