ohdsi-studies / Covid19PredictionStudies

Development and validation OHDSI network studies for the covid19 prediction topic
9 stars 9 forks source link

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

Open kandupl opened 4 years ago

kandupl commented 4 years ago

DBMS: impala

Error: java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'. Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )).

SQL: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref ( covariate_id, covariate_name, analysis_id, concept_id ) SELECT covariate_id,

CAST(CONCAT('drug_era group during day -365 through -1 days relative to index: ', CASE WHEN concept_name IS NULL THEN 'Unknown concept' ELSE concept_name END) AS VARCHAR(512)) AS covariate_name,

410 AS analysis_id,
CAST((covariate_id - 410) / 1000 AS INT) AS concept_id

FROM ( SELECT DISTINCT covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1 ) t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON concept_id = CAST((covariate_id - 410) / 1000 AS INT)

R version: R version 3.6.0 (2019-04-26)

Platform: x86_64-redhat-linux-gnu

Attached base packages:

Other attached packages:

Time it took to run- 9 hours / R packages are adequately tuned?
Can we generate table/column statistics for the respective tables during each ETL?

2020-04-01 20:31:23 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used 2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used 2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton only the first element is used as variable name 2020-04-01 20:49:17 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Creating Cohorts 2020-04-01 20:49:17 [Main thread] WARN HospitalizationInSymptomaticPatientsValidation execute Error 'cannot open the connection' when writing log to file '/app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Data/MCID_2/log.txt. Removing file appender from logger. 2020-04-01 20:49:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used 2020-04-01 20:49:17 [Main thread] WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing. 2020-04-02 01:55:15 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation createCohorts Counting cohorts 2020-04-02 01:55:16 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Validating Models 2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp Evaluating model in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2 2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp plpResult found in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2 2020-04-02 01:55:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used 2020-04-02 01:55:18 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used 2020-04-02 01:55:56 [Main thread] INFO PatientLevelPrediction 3 Error: Error: Error executing SQL: java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'. Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )). An error report has been created at /app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Code/errorReport.txt

Thanks Lakshmi

jreps commented 4 years ago

what PLP package are you trying to run?

kandupl commented 4 years ago

https://github.com/ohdsi-studies/Covid19PredictionStudies/tree/master/HospitalizationInSymptomaticPatientsValidation

jreps commented 4 years ago

One thing I see is the table name is long (probably worth reducing it): WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing.

The issue is some casting problem: (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ) - the covariate_name column in the covariate reference table is too short. I think this is a FeatureExtraction bug.

gklebanov commented 4 years ago

hey Lakshmi - let's debug it together with Konstantin and Vitaly?

kandupl commented 4 years ago

Sure will debug with Konstantin and Vitaly.