hubverse-org / hubData

Tools for accessing and working with hubverse Hub data
https://hubverse-org.github.io/hubData/
Other
3 stars 4 forks source link

Feature request: query files from github #5

Open dshemetov opened 10 months ago

dshemetov commented 10 months ago

It would be nice to have a hubUtils function to download specific (team x forecast_date) forecasts from a hubVerse Github repo. Currently we roll our own functions for this (using Github API or Zoltar under the hood), but I suspect others will want this too. We'd be happy to help bring our custom functions over here.

# Examples from our evalcast code above
r$> evalcast::get_covidhub_predictions("CMU-TimeSeries", forecast_dates = "2023-10-02") %>% tibble
# A tibble: 4,896 × 10
   ahead geo_value quantile value forecaster     forecast_date data_source signal  
   <int> <chr>        <dbl> <dbl> <chr>          <date>        <chr>       <chr>   
 1     1 ak           0.01  0     CMU-TimeSeries 2023-10-02    hhs         confirm…
 2     1 ak           0.025 0     CMU-TimeSeries 2023-10-02    hhs         confirm…
 3     1 ak           0.05  0     CMU-TimeSeries 2023-10-02    hhs         confirm…
 4     1 ak           0.1   0     CMU-TimeSeries 2023-10-02    hhs         confirm…
 5     1 ak           0.15  0.375 CMU-TimeSeries 2023-10-02    hhs         confirm…
 6     1 ak           0.2   0.907 CMU-TimeSeries 2023-10-02    hhs         confirm…
 7     1 ak           0.25  1.34  CMU-TimeSeries 2023-10-02    hhs         confirm…
 8     1 ak           0.3   1.69  CMU-TimeSeries 2023-10-02    hhs         confirm…
 9     1 ak           0.35  2.04  CMU-TimeSeries 2023-10-02    hhs         confirm…
10     1 ak           0.4   2.38  CMU-TimeSeries 2023-10-02    hhs         confirm…
# ℹ 4,886 more rows
# ℹ 2 more variables: target_end_date <date>, incidence_period <chr>
# ℹ Use `print(n = ...)` to see more rows

# List forecast dates for a given forecaster
r$> evalcast::get_covidhub_forecast_dates("CMU-TimeSeries")
  [1] "2020-07-20" "2020-07-27" "2020-08-02" "2020-08-10" "2020-08-17" "2020-08-24"
  [7] "2020-08-31" "2020-09-07" "2020-09-14" "2020-09-21" "2020-09-28" "2020-10-05"
  ...

# List forecasters in the repo
r$> evalcast::get_covidhub_forecaster_names()
  [1] "AIMdyn-SlidingKoopman"         "AIpert-pwllnod"               
  [3] "AMM-EpiInvert"                 "Auquan-SEIR"                  
  [5] "BPagano-RtDriven"              "CDDEP-ABM"                    
  [7] "CDDEP-SEIR_MCMC"               "CEID-Walk"                    
  [9] "CEPH-Rtrend_covid"             "CMU-TimeSeries"
  ...
nickreich commented 3 months ago

At the moment, we are focusing efforts on moving hubs to S3-based storage where queries can be run directly on storage buckets to return data, so I'm going to put this issue "on hold".