reichlab / cladetime

Documentation
https://cladetime.readthedocs.io
MIT License
0 stars 0 forks source link

Missing date values #36

Open elray1 opened 4 days ago

elray1 commented 4 days ago

Clade counts have rows with a missing date:

> read_parquet("https://covid-clade-counts.s3.amazonaws.com/2024-10-14_covid_clade_counts.parquet") |>
+   filter(is.na(date))
# A tibble: 197 × 4
   location     date   clade count
   <chr>        <date> <chr> <int>
 1 South Dakota NA     23I       3
 2 Virginia     NA     21H      12
 3 Mississippi  NA     20I       8
 4 Virginia     NA     23H      12
 5 Louisiana    NA     22C       1
 6 Virginia     NA     23E       7
 7 Maryland     NA     22E       1
 8 South Dakota NA     20A     504
 9 Nebraska     NA     21C       2
10 Maryland     NA     21F       1
# ℹ 187 more rows
# ℹ Use `print(n = ...)` to see more rows

The above file was generated using this script.

Do we have an understanding of how these missing dates arise? Is this a bug we need to fix, or something about the source data?