simonw / big-local-datasette

Publishing a Datasette of open projects from biglocalnews.org
https://biglocal.datasettes.com/
2 stars 0 forks source link

COVID_CDC_SVI not being imported for some reason #20

Closed simonw closed 4 years ago

simonw commented 4 years ago

There are two CSV files in that project, but this is happening: https://biglocal.datasettes.com/

Big_Local_News__Open_Projects__COVID_AHA_Hospital_beds__COVID_CDC_SVI__COVID_COVID19Tracker__COVID_HospitalBeds_CountyDemographics_NursingHomes__COVID_National_Health_Security_Preparedness_Index__COVID_USAFacts_county_cases__COVID_twitter_d

simonw commented 4 years ago

Here are the two CSV files that should have been imported: https://biglocal.datasettes.com/biglocal/files?project=UHJvamVjdDpiMGVmMjIyYS0zNzE4LTRhZTgtYWJjNC1lNzA3M2M0MDFmZGQ%3D&_facet=ext&ext=csv

project project_label ext createdAt name updatedAt uri uriType size etag
UHJvamVjdDpiMGVmMjIyYS0zNzE4LTRhZTgtYWJjNC1lNzA3M2M0MDFmZGQ= COVID_CDC_SVI csv 2020-03-18T23:40:26.279000+00:00 SVI2018_US.csv 2020-03-18T23:40:26.279000+00:00 download 44704458 """34dc9bc76fc77f227af4d4ffe4e69c15"""
UHJvamVjdDpiMGVmMjIyYS0zNzE4LTRhZTgtYWJjNC1lNzA3M2M0MDFmZGQ= COVID_CDC_SVI csv 2020-03-18T23:40:02.307000+00:00 SVI2018_US_COUNTY.csv 2020-03-18T23:40:02.307000+00:00 download 1885419 """01ce905c04ddf3b7bff299f0dbc05543"""
simonw commented 4 years ago

That fixed it. https://biglocal.datasettes.com/COVID_CDC_SVI

simonw commented 4 years ago

... and it's gone again! https://biglocal.datasettes.com/COVID_CDC_SVI

simonw commented 4 years ago

https://github.com/simonw/big-local-datasette/runs/590623330?check_suite_focus=true

total 31M
drwxr-xr-x 2 runner docker 4.0K Apr 15 23:53 .
drwxr-xr-x 7 runner docker 4.0K Apr 15 23:53 ..
-rw-r--r-- 1 runner docker  80K Apr 15 23:53 COVID_AHA_Hospital_beds.db
-rw-r--r-- 1 runner docker    0 Apr 15 23:53 COVID_CDC_SVI.db
-rw-r--r-- 1 runner docker 512K Apr 15 23:53 COVID_COVID19Tracker.db
-rw-r--r-- 1 runner docker  13M Apr 15 23:53 COVID_HospitalBeds_CountyDemographics_NursingHomes.db
-rw-r--r-- 1 runner docker 3.9M Apr 15 23:53 COVID_National_Health_Security_Preparedness_Index.db
-rw-r--r-- 1 runner docker  14M Apr 15 23:53 COVID_USAFacts_county_cases.db
-rw-r--r-- 1 runner docker    0 Apr 15 23:53 COVID_twitter_data.db
-rw-r--r-- 1 runner docker 156K Apr 15 23:53 biglocal.db
-rw-r--r-- 1 runner docker 2.2K Apr 15 23:53 databases.json
-rw-r--r-- 1 runner docker  21K Apr 15 23:53 metadata.json
simonw commented 4 years ago

I'll try re-running each step from the Action on my laptop to see if I can replicate what's happening.

simonw commented 4 years ago

Part of the problem is here: https://github.com/simonw/big-local-datasette/blob/c9a8908a8b214950d17d4dac30d8697b8019e8ce/populate_tables.py#L39-L41 Once a 0 byte file is on disk, it will be skipped in the future because the hash in the local copy of databases.json stayed the same.

I'm still not sure how we got the 0 byte files in the first place though!

I'm going to say "always download if the local DB file is missing or 0 bytes".

simonw commented 4 years ago

https://biglocal.datasettes.com/COVID_CDC_SVI is still empty.

simonw commented 4 years ago

I found the tables! For some reason they ended up in the incorrect database:

https://biglocal.datasettes.com/COVID_twitter_data

COVID_twitter_data

Plus the debug output says: https://github.com/simonw/big-local-datasette/runs/594103343?check_suite_focus=true


Fetching SVI2018_US into DB COVID_AHA_Hospital_beds
SVI2018_US 44704458
Fetching SVI2018_US_COUNTY into DB COVID_AHA_Hospital_beds
SVI2018_US_COUNTY 1885419```
simonw commented 4 years ago

https://biglocal.datasettes.com/COVID_CDC_SVI has one table now, but it should have two.

Likewise https://biglocal.datasettes.com/COVID_twitter_data has one table, but it should have 5 - maybe they are too big? https://biglocal.datasettes.com/biglocal/files?_facet=project&project=UHJvamVjdDo4NTBjOWJmYy03YzAyLTRkNDgtYjYzMS04OThhODFmZjQxNDQ%3D&_facet=ext&ext=csv#facet-project

simonw commented 4 years ago

Fixed!