reichlab / zoltr

http://reichlab.io/zoltr/
GNU General Public License v3.0
2 stars 4 forks source link

do_zoltar_query() sometimes gets a parsing failure warning #30

Closed matthewcornell closed 3 years ago

matthewcornell commented 3 years ago

This query causes the warning:

zoltar_connection <- zoltr::new_connection(Sys.getenv("Z_HOST"))
zoltr::zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD"))
the_projects <- zoltr::projects(zoltar_connection)
project_url <- the_projects[the_projects$name == "COVID-19 Forecasts", "url"]
df <- do_zoltar_query( zoltar_connection, project_url, is_forecast_query = TRUE, models = "COVIDhub-ensemble",
                       timezeros = as.Date("2020-10-05"))

readr::problems(df) ->

# A tibble: 100,060 x 5
     row col      expected           actual file        
   <int> <chr>    <chr>              <chr>  <chr>       
 1 13253 quantile 1/0/T/F/TRUE/FALSE 0.99   <raw vector>
 2 13254 quantile 1/0/T/F/TRUE/FALSE 0.975  <raw vector>
 3 13255 quantile 1/0/T/F/TRUE/FALSE 0.95   <raw vector>
 4 13256 quantile 1/0/T/F/TRUE/FALSE 0.9    <raw vector>
 5 13257 quantile 1/0/T/F/TRUE/FALSE 0.85   <raw vector>
 6 13258 quantile 1/0/T/F/TRUE/FALSE 0.8    <raw vector>
 7 13259 quantile 1/0/T/F/TRUE/FALSE 0.75   <raw vector>
 8 13260 quantile 1/0/T/F/TRUE/FALSE 0.7    <raw vector>
 9 13261 quantile 1/0/T/F/TRUE/FALSE 0.65   <raw vector>
10 13262 quantile 1/0/T/F/TRUE/FALSE 0.6    <raw vector>
# … with 100,050 more rows

@nickreich says:

Pretty sure this is the issue with data types. I think what is happening is all the point forecasts are at the top of the dataframe so the first 1000+ rows of the quantile column are NA. So the tibble automatically assigns a logical data type to the quantile column. but then when it reads the whole file and sees that there are numbers it gets confused.

That makes sense. My job_data() function in zoltr is where I specify column types. For forecast queries we pass this col_types through get_resource() to readr::read_csv() :

"cDcccc?????????"

You see that we decided to let R figure out what to do with the columns that might be empty. This is because data types for those columns depend on target types.