rfhb / ctrdata

Aggregate and analyse information on clinical trials from public registers
https://rfhb.github.io/ctrdata/
Other
41 stars 5 forks source link

Recruitment fields for trial on ClinicalTrials.Gov not available through ctrdata? #38

Closed rfhb closed 1 month ago

rfhb commented 2 months ago

Observation

Analysis

Solution

rfhb commented 2 months ago

This enhancement has been implemented in branch add_historic_versions, and this has now been merged into master. Walkthrough example:

Analysing sample size using historic versions of trial records

Historic versions can set to be retrieved for CTGOV2 by specifying ctgov2history = <...> when using ctrLoadQueryIntoDb(); this functionality was added in ctrdata version 1.18.0. Historic versions are automatically retrieved for CTIS. The versions include all trial data available at the date of the respective version.

For CTGOV2 records, the historic versions are added as follows into the ctrdata data model of a trial record, where the ellipsis ... represents all trial data fields:

{"_id":"NCT01234567", "title": "Current title", ..., "history": [{"history_version": {"version_number": 1, "version_date": "2020-21-22 10:11:12"}, "title": "Original title", ...}, {"history_version": {"number": 2, "date": "2021-22-23 11:13:13"}, "title": "Later title", ...}]}

The example shows how planned or realised number of participants (sample size) changed over time for individual trials, using data from both registers.

# install ctrdata from development branch
remotes::install_github("rfhb/ctrdata")

# load package
library(ctrdata)

# open database
db <- nodbi::src_sqlite(collection = "my_collection")

# read documentation of new 
# parameter "ctgov2history"
help("ctrLoadQueryIntoDb")

# load some trials from CTGOV2 specifying that 
# for each trial, 10 versions should be retrieved
ctrLoadQueryIntoDb(
  queryterm = "https://clinicaltrials.gov/search?cond=neuroblastoma&aggFilters=phase:3,status:com", 
  con = db, 
  ctgov2history = 10
)
# * Appears specific for CTGOV REST API 2.0
# * Found search query from CTGOV2: cond=neuroblastoma&aggFilters=phase:3,status:com
# * Checking trials using CTGOV API 2.0, found 24 trials
# (1/3) Downloading in 1 batch(es) (max. 1000 trials each; estimate: 2.4 MB total)
# (2/3) Converting to NDJSON...
# (3/3) Importing records into database...
# JSON file #: 1 / 1                               
# * Checking historic versions of trial records...
# - Merging trial versions . . . . . . . . . . . . . . . . . . . . . . . . 
# - Updating trial records . . . . . . . . . . . . . . . . . . . . . . . . 
# Updated 24 trial(s) with historic versions
# = Imported or updated 24 trial(s)
# Updated history ("meta-info" in "my_collection_name")

ctrLoadQueryIntoDb(
  queryterm = "https://euclinicaltrials.eu/app/#/search?basicSearchInputAND=cancer&ageGroupCode=2", 
  con = db
)

result <- dbGetFieldsIntoDf(
  fields = c(
    # CTGOV2
    "history.protocolSection.designModule.enrollmentInfo.count",
    "history.history_version",
    # CTIS
    "applications.submissionDate",
    "applications.partI.rowSubjectCount"
  ),
  con = db
)

# helpers
library(dplyr)
library(tidyr)
library(ggplot2)

# mangle and plot
result %>%
  unnest(cols = starts_with("history.")) %>%
  unnest(cols = starts_with("applications.")) %>%
  mutate(version_date = as.Date(version_date)) %>% 
  mutate(count = dfMergeVariablesRelevel(., colnames = c(
    "history.protocolSection.designModule.enrollmentInfo.count", 
    "applications.partI.rowSubjectCount"))) %>% 
  mutate(date = dfMergeVariablesRelevel(., colnames = c(
    "applications.submissionDate", "version_date"))) %>% 
  select(`_id`, count, date) %>% 
  arrange(`_id`, date) %>%
  group_by(`_id`) %>%
  ggplot(
    mapping = aes(
      x = date,
      y = count,
      colour = `_id`)
  ) +
  geom_step() +
  geom_point() +
  theme_light() +
  guides(colour = "none")

samplesizechanges

rfhb commented 1 month ago

Issued closed with 7bc46f983ce50f839b367cf4b122cf53e766d297.