uclalawcovid19behindbars / covid19_behind_bars_scrapers

Scrapers to pull in weekly COVID data for incarcerated populations
GNU General Public License v3.0
4 stars 5 forks source link

`new york` extraction is truncated #404

Open lpw3 opened 2 years ago

lpw3 commented 2 years ago

New York scraper from pdf (https://doccs.ny.gov/system/files/documents/2022/01/copy-of-incarceratedindividualdailycovid_table_forpio-2022.01.27_0.pdf) is started on line 10 -- likely a cropping issue. The log file (http://104.131.72.50:3838/scraper_data/log_files/2022-01-27_new_york.log) shows that the extraction is beginning on the Bedford Hills facility line, which is reflected in the extracted data.

hjohns12 commented 2 years ago

I got the same error today, but when I checked the facility rows extracted by sorting alphabetically by Name, all the ones before "BEDFORD HILLS" are there. So I think this scraper is working as expected. BUT this error is confusing and keeps coming back up, so we should change something! But luckily I think the data extraction is working just fine :)