Error: Join columns must be present in data.

ohdsi-studies / ScyllaEstimation

This OHDSI network study assesses the comparative effectiveness and safety among treatments administered during hospitalization and prior to intensive services. It also assesses the comparative effectiveness and safety among treatments administered after COVID-19 positive testing or diagnosis in the outpatient setting without prior hospitalization.

1 stars 1 forks source link

Error: Join columns must be present in data. #11

Open alabarga opened 3 years ago

alabarga commented 3 years ago

We are facing this error running the study, any ideas? Find attached generated errorReportR.txt

Running CohortMethod analyses
Error: Join columns must be present in data.
✖ Problem with `targetId`.
Backtrace:
     █
  1. ├─ScyllaEstimation::execute(...)
  2. │ └─ScyllaEstimation::runCohortMethod(...)
  3. │   └─`%>%`(...)
  4. ├─dplyr::inner_join(., analysisDescription, by = "analysisId")
  5. ├─dplyr::inner_join(...)
  6. ├─dplyr::inner_join(...)
  7. ├─dplyr::inner_join(...)
  8. └─dplyr:::inner_join.data.frame(...)
  9.   └─dplyr:::join_mutate(...)
 10.     └─dplyr:::join_cols(...)
 11.       └─dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
 12.         └─dplyr:::check_join_vars(by$x, x_names)
An error report has been created at  /data/ScyllaEstimation/errorReportR.txt

alabarga commented 3 years ago

the cohort table in the CDM looks like

cohort_definition_id	subject_id	cohort_start_date	cohort_end_date
1009	4579522779730604891	2020-02-09	2020-02-09
1009	6731742195644174274	2020-02-08	2020-02-10
1009	3688281720704476065	2020-08-28	2020-08-28

schuemie commented 3 years ago

Perhaps the exposure cohorts are empty? Could you check your cohort_counts.csv file to see if any of the exposure cohorts (e.g. 'Hydroxychloroquine with Treatment administered on the date of admission of hospitalization and prior to intensive services and 365d prior observation') has a non-NA count?

Also, could you check if any of these files exist? (in the same folder as cohort_counts.csv)?:

analysisSummary_100.rds
analysisSummary_200.rds
analysisSummary_300.rds
analysisSummary_400.rds

alabarga commented 3 years ago

not all, but some have non-NA values

cohort_id	name	cohort_entries	cohort_subjects	database_id
1001000011	Hydroxychloroquine with Treatment administered on the date of admission of hospitalization and prior to intensive services and 365d prior observation	313	313	hdm
1002000011	Hydroxychloroquine + Azithromycin with Treatment administered on the date of admission of hospitalization and prior to intensive services and 365d prior observation	18	18	hdm

however, no *.rds files are present

schuemie commented 3 years ago

So the problem appears to be that most exposure cohorts are either empty or very small, leaving no comparisons with sufficient data.

This study package doesn't really allow you to find out why there are so few people meeting the cohort criteria. Did you run ScyllaCharacterization?