Closed frederikziebell closed 7 months ago
Thanks for reporting, nice catch! Fix now available:
devtools::install_github("rfhb/ctrdata")
As an aside, if not yet known and in case the tibble has no particular need, this also works:
library("ctrdata")
ctrLoadQueryIntoDb(
queryterm = "spons=NameOfSponsor",
register = "CTGOV2",
only.count = TRUE
)$n
Thanks, and also for pointing out the shorter syntax, it's working now. For some companies, I see however differences in the number of returned results between both registers:
ctrLoadQueryIntoDb(
queryterm = "spons=Janssen",
register = "CTGOV",
only.count = TRUE
)$n
# 2352
ctrLoadQueryIntoDb(
queryterm = "spons=Janssen",
register = "CTGOV2",
only.count = TRUE
)$n
# 2347
But I don't know if that's because the new API accesses the data differently from the CTGOV database, or if it's an issue with ctrdata
.
Thanks - you find the same numbers when opening this search query in the browser like below. I have no explanation for this and can only speculate that in the backend, different matching processes take place. Try modifying the sponsor name in the browser and see different expansions offered.
ctrOpenSearchPagesInBrowser(url = "spons=Janssen", register = "CTGOV")
ctrOpenSearchPagesInBrowser(url = "spons=Janssen", register = "CTGOV2")
Nevertheless, it is straightforward to generate a list of the set difference, as follows:
dbc <- nodbi::src_sqlite(collection = "temp")
ctgovTrials <- ctrLoadQueryIntoDb(queryterm = "spons=Janssen", register = "CTGOV", con = dbc)
ctgov2Trials <- ctrLoadQueryIntoDb(queryterm = "spons=Janssen", register = "CTGOV2", con = dbc)
trialsSet <- dbGetFieldsIntoDf(c("sponsors.lead_sponsor.agency", "brief_title"), con = dbc)
trialsSet[trialsSet[["_id"]] %in% setdiff(ctgovTrials[["success"]], ctgov2Trials[["success"]]), ]
which returns
# A tibble: 5 × 3
`_id` sponsors.lead_sponsor.agency brief_title
<chr> <chr> <chr>
1 NCT02135354 Wim Janssens Azithromycin for Acute Exacerbations Requiring Hospitaliza…
2 NCT02205242 Wim Janssens BACE Trial Substudy 1 - PROactive Substudy
3 NCT02205255 Wim Janssens BACE Trial Substudy 2 - FarmEc Substudy
4 NCT02332122 Wim Janssens Detection of Aspergillus Fumigatus and Sensitization in CO…
5 NCT05008081 Wim Janssens The CATALINA Study
There you have it, possibly CTGOV uses a partial string match, and CTGOV2 matches differently, see e.g. here https://clinicaltrials.gov/data-about-studies/search-areas#SponsorSearch
Thanks for the clarification. Btw, I get an error with the latest devel build and your example:
dbc <- nodbi::src_sqlite(collection = "temp")
ctgovTrials <- ctrLoadQueryIntoDb(queryterm = "spons=Janssen", register = "CTGOV", con = dbc)
gives
Not overruling register label CTGOV
* Found search query from CTGOV: spons=Janssen
Checking helper binaries: . . . done
Warning: Database not persisting* Checking trials in CTGOV classic...
Retrieved overview, records of 2352 trial(s) are to be downloaded (estimate: 19 MB)
(1/3) Downloading trial file...
Error in handle_setopt(h, ...) : Unknown option: multiplex
The call to ctrLoadQueryIntoDb()
with only.count = TRUE
works, so I guess the issue concerns multiplexed downloading.
Should I open a separate issue for that?
Thanks. Could you please update R package curl
, version 5.1.0 does not trigger this error; I will specify this requirement.
Somewhat unrelated, but I'll leave it here for future reference.
I was getting this error with CTGOV2:
* Checking trials using CTGOV API 2.0.0.-test...Warning: Error in curl::curl_fetch_memory: Timeout was reached: [www.clinicaltrials.gov] Resolving timed out after 10011 milliseconds
... which was apparently also solved by updating curl.
Edit: Actually unrelated to curl update. Not sure why, but I'm getting this sometimes.
Indeed completely unrelated to ctrdata
, possibly a network or server issue.
Consider the following example. With the old API, the filter is respected, whereas with the new one, all studies would be downloaded.