rfhb / ctrdata

Aggregate and analyse information on clinical trials from public registers
https://rfhb.github.io/ctrdata/
Other
43 stars 6 forks source link

Incorrect number of trials found with date variables in query URLs for ClinicalTrials.gov #39

Closed machado-t closed 3 months ago

machado-t commented 3 months ago

I have encountered what appears to be a bug when retrieving clinical trial information from ClinicalTrials.gov using end date variables (_YYYY-MM-DD) in query URLs. The use of an end date filter seems to not be applied. This applies to to/end dates, and not for from/start dates, or when both dates are used.

Reproduction:

library(ctrdata)

# Define connection to the database
db <- nodbi::src_sqlite(dbname = "test_database.sqlite", collection = "test_collection")

# Query with end date variable (should find 2,226 studies as of today, but finds 20898)
query_with_end_date <- "https://clinicaltrials.gov/search?cond=diabetes&resFirstPost=_2020-01-01"
ctrLoadQueryIntoDb(queryterm = query_with_end_date, con = db, only.count = TRUE)

# Query without end date variable (correctly finds 982 studies today)
query_without_end_date <- "https://clinicaltrials.gov/search?cond=diabetes&resFirstPost=2020-01-01_"
ctrLoadQueryIntoDb(queryterm = query_without_end_date, con = db, only.count = TRUE)

# Query with a complete date range (correctly finds 647 studies today)
query_with_date_range <- "https://clinicaltrials.gov/search?cond=diabetes&resFirstPost=2020-01-01_2023-01-01"
ctrLoadQueryIntoDb(queryterm = query_with_date_range, con = db, only.count = TRUE)

Thank you!

rfhb commented 3 months ago

Confirmed, cause is a regex in code translating URLs to API parameters.

rfhb commented 3 months ago

@machado-t Thanks. An updated dev version resolves this, install with: remotes::install_github("rfhb/ctrdata")

machado-t commented 3 months ago

Thanks for your quick fix! I am sorry I will not be able to test it myself any time soon though.