presagia-analytics / ctrialsgov

Query Data from ClinicalTrials.gov
https://presagia-analytics.github.io/ctrialsgov/
Other
12 stars 3 forks source link

Unable to download database #20

Closed k-maciejewski closed 1 year ago

k-maciejewski commented 1 year ago

Hi,

When I run the following, I get a (new) fatal error and unable to refresh/download the db

devtools::install_github("presagia-analytics/ctrialsgov")
library(ctrialsgov) 
library(DBI)
library(duckdb)
library(dbplyr)

ctgov_get_latest_snapshot(db_path = "ctgov.duckdb",
                          db_derived_path = "ctgov-derived.duckdb") 

[100%] Downloaded 1627994882 bytes... [2023-04-29T10:49:14] LOADING TABLE 'active_storage_attachments' Error: '/var/folders/9b/lrhm9r7d5d3cs8p1g2rtx4t40000gn/T//RtmpJV6cIk/active_storage_attachments.txt' does not exist. In addition: Warning message: In readLines(curl("https://aact.ctti-clinicaltrials.org/pipe_files")) : incomplete final line found on 'https://aact.ctti-clinicaltrials.org/pipe_files'

statsmaths commented 1 year ago

I just tried the same thing myself and got the same error. The issue comes from a call to ctgov_create_duckdb, which tries to create a local version of the Clinicial Trials Database from the downloaded flat files. However, the dump seems to be missing three previously used tables: active_storage_attachments, active_storage_blobs, file_records. None of them are used in our derived data, so we can easily just not load them.

I updated the package just now so that ctgov_create_duckdb checks which tables exist and only loads those that do with 8470228. That should help guard against any other small tables (there are many that have very little data in them) are removed. Please let me know if that solves your issue!