matthewarmand / arsenal-america-pub-scraper

Automated extraction of Arsenal America pubs list for export.
https://drive.google.com/open?id=1lGiu2QTjyGmUcSdNN6EeZFiUXv6CL4NR&usp=sharing
MIT License
0 stars 0 forks source link

"The Parlor" not exported because it lacks phone number #1

Open matthewarmand opened 5 years ago

matthewarmand commented 5 years ago

The Arsenal Bars page doesn't have an ideal DOM structure for scraping, so the current method of ensuring we only extract the bars is to check for a link with the name of the pub, and a phone number. Without the phone number validation, we need some other way to filter out all the other links on the page.

The Parlor in LA does have a phone number, but for whatever reason it wasn't put into the webpage. As a result, its not currently represented in the map. Possible solutions to this could include figuring out a better way to isolate and extract the pubs and relevant information from the page, or the page being restructured to include class names to help in that effort.