Getting a 'Resources exceeded during query execution' error during stratification step

tseto commented 3 years ago

Hello,

I’m trying to run through the PioneerWatchfulWaiting code. The 38 target cohorts were create successfully as well as the 7 feature cohorts.

I’m getting this error during the stratified target cohort step, query is too complex. We’re running against Google Cloud BigQuery. The log doesn’t indicate which SQL file is being run. Can you let me know which file so I can take a look at the query?

I'll also try to run my docker image in the GCP project rather than locally to see if that helps.

---- Creating stratified target cohorts ----

Stratify by Full 365-day follow up
|=======================================================================================| 100% Executing SQL took 9.79 secs Stratify by < 365-day follow up |=======================================================================================| 100% Executing SQL took 9.18 secs Stratify by Black or African American |=======================================================================================| 100% Executing SQL took 11.4 secs Stratify by White |=======================================================================================| 100% Executing SQL took 11.3 secs Stratify by cohorts
| | 0%Error: Error executing SQL: java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. An error report has been created at /home/ohdsi/workdir/notebooks/errorReportSql.txt Run rlang::last_error() to see where the error occurred.

rlang::last_error() <error/rlang_error> Error executing SQL: java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. An error report has been created at /home/ohdsi/workdir/notebooks/errorReportSql.txt Backtrace:

PioneerWatchfulWaiting::runStudy(...)

PioneerWatchfulWaiting:::createBulkStrata(...)

PioneerWatchfulWaiting:::createBulkStrataFromCohorts(...)

DatabaseConnector::executeSql(connection, sql)

base::tryCatch(...)

base:::tryCatchList(expr, classes, parentenv, handlers)

base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])

value[3L]

DatabaseConnector:::.createErrorReport(...) Run rlang::last_trace() to see the full context. rlang::last_trace() <error/rlang_error> Error executing SQL: java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. An error report has been created at /home/ohdsi/workdir/notebooks/errorReportSql.txt Backtrace: █

└─PioneerWatchfulWaiting::runStudy(...)

└─PioneerWatchfulWaiting:::createBulkStrata(...)

└─PioneerWatchfulWaiting:::createBulkStrataFromCohorts(...)

└─DatabaseConnector::executeSql(connection, sql)

└─base::tryCatch(...)

└─base:::tryCatchList(expr, classes, parentenv, handlers)

└─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])

└─value[3L]

└─DatabaseConnector:::.createErrorReport(...)

keesvanbochove commented 3 years ago

The SQL query in question would be: https://github.com/ohdsi-studies/PioneerWatchfulWaiting/blob/master/inst/sql/sql_server/strata/StratifyByCohort.sql. You can generate the exact SQL by using https://data.ohdsi.org/SqlDeveloper/, pasting the above file in there, selecting BigQuery and filling in the settings you are using).

Another option could be to review the BigQuery query log - I only know how to do this in Postgres but I guess GCP would have similar options (see e.g. https://cloud.google.com/bigquery/docs/reference/auditlogs)

I'm surprised that this is the query it's choking on, because it is not super complex... although it does have a few nested structures. Do you think it would help if we rewrote it to use temp tables rather than using subqueries/nesting?

tseto commented 3 years ago

There seems to be some limitations with BQ. Rewriting queries may help.

https://cloud.google.com/bigquery/quotas

tseto commented 3 years ago

Modifying my local copy of StrataConstruction.R to create a record and select unnest will work better in BigQuery than thousands of union statements.

SELECT * FROM UNNEST(ARRAY<STRUCT<target_id INT64, strata_id INT64, cohort_id INT64, cohort_type STRING>> [(105, 338, 105003382, 'TwoS'), (106, 338, 106003382, 'TwoS')...

keesvanbochove commented 3 years ago

@tseto could you contribute a PR with this fix? I'd like to test it in other database systems as well.

tseto commented 3 years ago

Records and STRUCT are BigQuery-specific syntax.

sql <- "SELECT * INTO #TARGET_STRATA_XREF FROM UNNEST(ARRAY<STRUCT<target_id INT64, strata_id INT64, cohort_id INT64, cohort_type STRING>> [ @records ]);" records <- ""; for(i in 1:nrow(targetStrataXref)) { stmt <- paste0("(", targetStrataXref$targetId[i], ",", targetStrataXref$strataId[i], ",", targetStrataXref$cohortId[i], ",", "'", targetStrataXref$cohortType[i], "')") if (i < nrow(targetStrataXref)) { records <- paste(records, stmt, ",", sep="\n") } else { records <- paste(records, stmt, sep="\n") } }

Alternatively, the SQL below should work for Postgres, Oracle and others.

sql <- "CREATE TABLE #TARGET_STRATA_XREF AS SELECT FROM JSON_TABLE ( '[ @records_in_json_array ]', '$[]' COLUMNS ( "target_id" path '$.target_id', "strata_id" path '$.strata_id', "cohort_id" path '$.cohort_id', "cohort_type" path '$.cohort_type' ));"

ohdsi-studies / PioneerWatchfulWaiting

Getting a 'Resources exceeded during query execution' error during stratification step #85