oxford-pharmacoepi / MegaStudy

4 stars 2 forks source link

Incidence/Prevalence Shiny: some DB not appearing in specific tabs #42

Open martapineda opened 7 months ago

martapineda commented 7 months ago

All DP who upload results included the Inc-prev attritions, and Inc-prev results files. However 3 DB are not seen/missing in specific tabs of the shiny:

tiozab commented 7 months ago

@martapineda

TURKU does not appear in the incidenceAttrition tab because they do not have any denominator_days_prior_observation == 30, since this does not show in the attrition, they wont have any estimates either.

that makes me question whether they have run the code correctly. I think they were playing with the code or have an older version? because also their incidence and prevalence in the csv have capital letters and that is why they are not detected by the shiny. We need to ask them to re-run and not change anything in the RunIncidencePrevalence.R without confirming with us, so everything stays in the correct format.

tiozab commented 7 months ago

@martapineda INGEF and ULSM and UZB do not have denominator_days_prior_observation == 30 and therefore also no incidence results. let's ask them if they have adapted the code: cdm <- generateDenominatorCohortSet( cdm = cdm, name = "denominator", cohortDateRange = as.Date(c("2010-01-01",NA)), ageGroup = list( c(0, 150), c(0, 17), c(18, 64), c(65, 150) ), sex = c("Both", "Female", "Male"), daysPriorObservation = c(0,30), # 30 for incidence, 0 for prevalence requirementInteractions = TRUE, overwrite = TRUE )

mikaelhogerman commented 6 months ago

Hi,

Turku had different database name in the first feasibility run that what we reported in main excel before anything was run. We named our files according to first reported names for incidence run. We exchanged few emails about this with Montse and Moncusi. @martapineda

The code didn't run without commenting "overwrite=TRUE" out. Otherwise it gave error: Error in generateDenominatorCohortSet(cdm = cdm, name = "denominator", : unused argument (overwrite = TRUE)

tiozab commented 6 months ago

@mikaelhogerman thanks for your anwer. did you employ the renv? if you receive this error, you did not use the correct version of the IncidencePrevalence Package https://github.com/oxford-pharmacoepi/MegaStudy/issues/24,

What we are more interested in is whether you would expect your patients to have prior observation or not, because if so, s.th. did not work. Can you retry with the renv?

tiozab commented 6 months ago

@mikaelhogerman re changing file names, if you must change s.th. in the file name, please do not amend the snippet that we provide (but work around it), thereby the files are seen in further processing, e.g. changing "incidence" to capital letter "Incidence" was not detected.

tiozab commented 6 months ago

@mikaelhogerman , one possibility why we do not see incident results could be that each hospital visit is a new observation, and we look back in the observation_period, therefore, unless a patient was in hospital for longer than 30 days and has received a medication of interest afterwards, that patient would be part of the incidence results. Just a thought

mikaelhogerman commented 6 months ago

Hi @tiozab I was able to employ renv and re-run scripts without modifying them at all. I have submitted new zip-file now but I am not sure that it solved the problem.

tiozab commented 6 months ago

@martapineda can you check the results? we can also have a look together on friday. @mikaelhogerman thank you for re-running the results. other than running code succesfully, it is also about looking at the results and questioning whether they make sense. Therefore, we asked whether you would expect patients to have prior observtion or not, and a reason why "no" may be the answer is that every hospital stay for the same person is a new "observation" and people not staying long enough in the hospital to accumulate sufficient observation days (>30). in any case, hospitals may not be the most reliable source for incidence results, so prevalence results are enough. Yet, it is always important to question what was going on ;-)

tiozab commented 6 months ago

@raeleesha-norris FYI

raeleesha-norris commented 6 months ago

@tiozab Thank you for guiding me here. We don't see the same issue in the DUS results which makes sense given the structure of our data and the available observation peirods for our patients, so I imagine that there was really a technical issue while running that will now be resolved. And as a heads-up, due to the long run time of the code, we will likely be able to share the results at the beginning of next week.

mikaelhogerman commented 6 months ago

Hi @tiozab, It it still unclear to me how do you define "accumulate sufficient observation days"? I believe results are otherwise ok.

tiozab commented 6 months ago

@mikaelhogerman, happy to elaborate: with "accumulate sufficient observation days" I meant that if each hospital visit was a new observation then a patient would need to be in hospital for at least 30 days before he or she can be part of the denominator group that requires at least "prior observation of 30 days", however, like you said how you define your observation period, you would expect patients to have sufficient amount of time (like e.g. once in the hospital their observation remains intact until their next visit, thus they "accumulate sufficient observation days"). If things worked for DUS then there indeed was something wrong with the Inc/Prev run, looking forward to look at your results again.

tiozab commented 6 months ago

@mikaelhogerman we have just had a look again at your results, the denominator_days_prior_observation == 30 days is still not there. So there is a problem somewhere in your data maybe since you mentioned that there were no errors and that you would patients expect to have prior observation time? @janblom is an ETL expert, please liaise with him.

tiozab commented 6 months ago

@mikaelhogerman we will still use your prevalence results though, it is just the incidence that is missing. will be helpful for the future as well to get behind what is going on.

tiozab commented 6 months ago

@mikaelhogerman if you want to investigate your data run with regards to available observation, you can use the PatientProfiles package and use the function addPriorObservation() (either in days before index date as default) or to see the start of observation for each individual, select "date" for priorObservationType. Here is the reference for this function https://darwin-eu-dev.github.io/PatientProfiles/reference/addPriorObservation.html

mikaelhogerman commented 4 months ago

Hi @tiozab I tried to run DUS code and it gave me error (separate issue) so I tried to run "addPriorObservation" but it gave me also an error: Error: GitHub record for package 'CDMConnector' has no recorded 'RemoteSha' / 'RemoteRef'

We checked our prevalence results without MegaStudys codes and it seems like there might be some problem with the results. Our conclusion was that all drugs usages seemed to be quite steady and not growing in time like prevalence shiny showed.

I'm not sure if this is related to that missing denominator_days_prior_observation values in Incidence runs but in prevalence plots looked like they cumulatively collected every usage and thats why all curves were going up all the time with y-axis (time) increasing.

Like I said DUS error is probably separate issue but I will paste error here also if if helps to narrow the Prevalence problem:

ℹ Generating 1 cohort ✔ Generating cohort (1/1) - covid_19) [6s] Error in validateGeneratedCohortSet() at omopgenerics/R/classCohortTable.R:53:3: ! There is overlap between entries in the cohort, 791 overlaps detected first 5: $ cohort_definition_id 1, 1, 1, 1, 1 $ subject_id 4041, 378600, 378600, 379151, 379653 $ cohort_start_date 2018-08-22, 2019-11-16, 2021-06-03, 2015-12-11, 2015-07-10 $ cohort_end_date 2019-01-24, 2021-06-04, 2021-08-06, 2017-03-04, 2016-09-13 $ next_cohort_start_date 2019-01-23, 2021-06-03, 2021-08-05, 2017-03-03, 2016-09-12 Run rlang::last_trace() to see where the error occurred.

rlang::last_trace() <error/rlang_error> Error in validateGeneratedCohortSet() at omopgenerics/R/classCohortTable.R:53:3: ! There is overlap between entries in the cohort, 791 overlaps detected first 5: $ cohort_definition_id 1, 1, 1, 1, 1 $ subject_id 4041, 378600, 378600, 379151, 379653 $ cohort_start_date 2018-08-22, 2019-11-16, 2021-06-03, 2015-12-11, 2015-07-10 $ cohort_end_date 2019-01-24, 2021-06-04, 2021-08-06, 2017-03-04, 2016-09-13 $ next_cohort_start_date 2019-01-23, 2021-06-03, 2021-08-05, 2017-03-03, 2016-09-12

Backtrace: ▆

  1. ├─base::source(here("DUS.R"))
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │ └─base::eval(ei, envir)
  5. └─CDMConnector::generateConceptCohortSet(...) at MegaStudy-main/DUS Code/DUS.R:123:3
  6. └─omopgenerics::newCohortTable(...) at CDMConnector/R/generateConceptCohortSet.R:353:3
  7. └─omopgenerics:::validateGeneratedCohortSet(cohort, soft = .softValidation) at omopgenerics/R/classCohortTable.R:53:3 Run rlang::last_trace(drop = FALSE) to see 3 hidden frames. rlang::last_trace(drop = FALSE) <error/rlang_error> Error in validateGeneratedCohortSet() at omopgenerics/R/classCohortTable.R:53:3: ! There is overlap between entries in the cohort, 791 overlaps detected first 5: $ cohort_definition_id 1, 1, 1, 1, 1 $ subject_id 4041, 378600, 378600, 379151, 379653 $ cohort_start_date 2018-08-22, 2019-11-16, 2021-06-03, 2015-12-11, 2015-07-10 $ cohort_end_date 2019-01-24, 2021-06-04, 2021-08-06, 2017-03-04, 2016-09-13 $ next_cohort_start_date 2019-01-23, 2021-06-03, 2021-08-05, 2017-03-03, 2016-09-12

    Backtrace: ▆

  8. ├─base::source(here("DUS.R"))
  9. │ ├─base::withVisible(eval(ei, envir))
  10. │ └─base::eval(ei, envir)
  11. │ └─base::eval(ei, envir)
  12. └─CDMConnector::generateConceptCohortSet(...) at MegaStudy-main/DUS Code/DUS.R:123:3
  13. └─omopgenerics::newCohortTable(...) at CDMConnector/R/generateConceptCohortSet.R:353:3
  14. └─omopgenerics:::validateGeneratedCohortSet(cohort, soft = .softValidation) at omopgenerics/R/classCohortTable.R:53:3
  15. └─omopgenerics:::checkOverlap(cohort) at omopgenerics/R/classCohortTable.R:207:5
  16. └─cli::cli_abort(...) at omopgenerics/R/classCohortTable.R:317:5
  17. └─rlang::abort(...) at cli/R/rlang.R:45:3
tiozab commented 4 months ago

@mikaelhogerman for the DUS please try the new code posted in #53

tiozab commented 4 months ago

@mikaelhogerman for the IncidencePrevalence, I am not sure I understand what the problem is with IncPrev. Your prevalent results were output but you are saying they are not what they should be?

my answer is that Denominator_days_prior_observation are not missing, the denominator function requests both at least "0" and at least "30" and there is no people for "30" in your data when applying our function / package.

In any case, the function is working, otherwise we would not get any results from IncidencePrevalence, yet, the input data may be weird.

Good idea to look at the priorObservation in your data. can you run the following please?

see <- cdm$drug_exposure %>% addPriorObservation(indexDate = "drug_exposure_start_date")

see %>% glimpse()

basically I was using the example from https://darwin-eu-dev.github.io/PatientProfiles/articles/demographics.html but for drugs. (I typed it by hand, sorry if there are typos in the example)

as for package versions, you can use CDMConnector 1.4.0 and PatientProfiles 1.0.0 or CDMConnector 1.2.1 and PatientProfiles 0.4.0,

Actually, try both and see whether they give the same result (forget about the renv and just output that few lines of code above please). That will put us closer as to what is going on I hope.

mikaelhogerman commented 4 months ago

Hi @tiozab

I ran the code with CDMConnector 1.4.0 & PatientProfiles 1.1.0 (not PatientProfiles 1.0.0) It did not run with older CDMConnector 1.2.1 and PatientProfiles 0.4.0 package versions so I could not provide those results

Rows: ?? Columns: 7 Database: DuckDB v0.10.1 [Omistaja_2@Windows 10 x64:R 4.2.2/:memory:] $ person_id 4180, 1880, 8145, 8740, 3707, 3959, 2336, 1912, 561, 8553, 4632, 3041, 1942, 9908, 4637, 8411, … $ drug_exposure_start_date 1945-08-29, 1933-04-18, 1977-03-25, 2092-11-25, 2095-01-16, 2030-01-13, 1996-11-16, 1960-12-26… $ drug_exposure_end_date 1947-01-30, 2031-10-22, 1977-05-30, 2134-05-06, 2133-08-04, 2037-05-11, 2011-10-08, 1973-09-03… $ drug_exposure_id 1, 2, 3, 5, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30… $ drug_concept_id 6, 6, 4, 3, 10, 10, 5, 2, 2, 10, 1, 7, 1, 4, 6, 6, 9, 4, 7, 8, 4, 8, 6, 1, 1, 9, 4, 1, 1, 8, 4,… $ drug_type_concept_id 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,… $ prior_observation 1336, 6317, 21268, 36854, 33983, 4030, 16025, 1455, 6033, 3777, 35864, 4113, 53729, 10819, 1152…

tiozab commented 4 months ago

@mikaelhogerman sorry for the confusion. Your "prior observation" is correct with the new packages, however they are for the DUS only, and that means the DUS may run. However, we need to fix the IncidencePrevalence code first. Thus, we need to see it with the older package versions. Can you repeat this exercise with CDMConnector 1.2.1 and PatientProfiles 0.4.0 package, please?

tiozab commented 4 months ago

Additionally, can you try all of the DUS code with the new code https://github.com/oxford-pharmacoepi/MegaStudy/issues/53 . There were some database management systems with which we had errors in the DUS but could successfully run it with the new code.

tiozab commented 4 months ago

@raeleesha-norris maybe you can tell us and especially @mikaelhogerman how you solved the problem on your end? because we saw the incidence results from INGEF in the webinar :-)

mikaelhogerman commented 4 months ago

@tiozab do you mean to run IncidencePrevalence or that small DUSS related code that you provided in few messages above with CDMConnector 1.2.1 and PatientProfiles 0.4.0? The small DUS check code did not run with those old packages. Just with new CDMConnector 1.4.0 & PatientProfiles 1.1.0

tiozab commented 4 months ago

@mikaelhogerman, sorry, I understand the confusion. Do you want to have a call? or rather continue on github?

mikaelhogerman commented 4 months ago

@tiozab Yes call could be good.

raeleesha-norris commented 4 months ago

Hi @tiozab @mikaelhogerman :)

The issue for us was that a processing error within our server occurred during a certain step while running the IncidencePrevalence code (I unfortunately don't remember where, but likely within the incidence attrition step) and we initially didn't realize it even happened because we were letting the script run in its entirety, and not stopping after each step to check the results for any issues. So after we found out about the missing results, we would wait after each step in the script was finished and checked the console for errors before continuing to the next step.

The error that we were somestimes getting was something like: Transaction (Process ID) was deadlocked on lock resources with another process and has been chosen as the deadlock victim

tiozab commented 4 months ago

thanks @raeleesha-norris! @mikaelhogerman can you try to run some simple example for incidence and see what happens? If not, I can guide you through during our call on tuesday!