Closed allysmatrix closed 3 years ago
Please provide update
I created a python script to scrap NJ State website https://nj.gov/state/elections/vote-secure-drop-boxes.shtml With the script I created this csv file https://docs.google.com/spreadsheets/d/16oKiMoLT1B3kfSd-Uo5LozL23HpLLnQHYEp3a6ksq6Q/edit?usp=sharing @aNullValue can you please take a look and let me know if format is acceptable? There are a handful of unclean entries but they can be fixed. @aNullValue can you also let me know here where is the export of the current NJ data?
AFAIK, the most recent data that we have from all states other than Georgia is available at https://github.com/hackforla/ballotnav/tree/master/backend/db/states
Regarding your example spreadsheet:
For that info, need to have the following columns at a minimum:
The goal is to split everything out to the maximum extent possible. The address components, in particular, must be divided into columns. Do note that if something says "Room", "Apartment", "Suite", etc., that information should be moved into an "Address (2)" or similar column -- not in "Address (1)".
Some states/jurisdictions provide municipality information. For the majority of states and in the majority of cases, the municipality information should be discarded, because it has no bearing on the election itself -- it's just there as a convenience for users of their site, and doesn't really have a logical place in our design. This includes -- to the best of my knowledge -- NJ. In a few states, municipality is more important than county; we instead discard county information and use only municipality information. That applies mostly to the New England area states, plus Michigan and Wisconsin.
Note also that the above list of columns is not comprehensive for the data that we ultimately want to collect, but it's what is relevant for the data readily available from NJ.
@giosce please read the above message from Drew who explains what I was trying to say on slack much better.
@giosce Please provide an update Progress Blockers Availability ETA of completion
The spreadsheet provided by Karen https://docs.google.com/spreadsheets/d/1UzYSmz6OQ8O2PnjCrER5FpAC1C1xwdFYJQhS3XXo_nc/edit?usp=sharing corresponds to this website https://www.state.nj.us/state/elections/vote-county-election-officials.shtml The one I scraped is https://nj.gov/state/elections/vote-secure-drop-boxes.shtml (actual boxes in the streets without phone or person to reach). I have uploaded the csv in google drive. I should be able to split the address as @aNullValue has posted above. I'll build a scraper for the election-officials and I'll upload the csv in google drive. The spreadsheet that Karen shared has much more info like phone & fax number. So I think the 2 scrapers will create 2 csv files (that will also be used for continuous comparison) and we'll merge them for DB uploading (or have 2 imports).
Still working on scrapers for elections-officials and dropoff boxes. I hope to have both csv samples in a week or so.
I'm at good point scraping the "elections officials" website, this is the latest draft https://drive.google.com/file/d/1uRu-eIWaZGTXvNXVS-qKuBsD7EIBfwDE/view?usp=sharing
Feel free to provide feedback, I know there are a handful of entries with problems, we can discuss the strategy to decide whether it makes sense to fix them in the scraper or manually.
I started using a python addresses parsing library with which I'm now scraping the "dropoff boxes" website. Hopefully I'll have a draft of this by the call next week.
The strategy I suggest is:
Let me know.
Thanks, Gio. This looks good. One thing that will make this much easier is having the data consistent and all columns equivalent to the schema we currently have in our DB on the front end to minimize adjustments later on. I'll have @aNullValue weigh in before we move forward, but please review the below resources to get an idea of what I mean. We can discuss further tonight.
https://docs.google.com/spreadsheets/d/1LXkjKz7eWdh71NDrq1lYnVN4DKrZ4UfMPWFkH7h16Nk/edit#gid=1304858482 https://github.com/hackforla/ballotnav-states/issues/40
@jmensch1 @aNullValue can you take a look at Gio's data, thanks (as a reference to our scraped data @alligatormonday)
@kcoronel add screen shares of Jake and Drew's convo on slack re: NJ data
Overview
In an effort to gather useful data before the June election. BallotNav wants to scrape NJ data ahead of time.
Action Items
Resources/Instructions
Finalized features for BallotNav 2021