sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

Create guide for scraping from websites that use PowerBI #32

Open collincr opened 4 years ago

collincr commented 4 years ago

Goal is to scrape data from the elements published using PowerBI, which is used by public health websites from Santa Clara and San Mateo Counties.

A related stackoverflow thread suggest a combination of Python and Selenium: https://stackoverflow.com/questions/55063265/scraping-data-from-a-website-which-uses-power-bi-retrieving-data-from-power-bi

frhino commented 4 years ago

How is it going with this one @collincr ? Need help?

collincr commented 4 years ago

Hi @frhino, thanks for checking, I'm a bit stuck with this. So far I've been able to identify the elements using the method on the linked stackoverflow page, but unable to retrieve anything else from them.

elaguerta commented 4 years ago

Hi @collincr and @frhino, this does seem tricky and at this point I wouldn't bet that it's possible. I have emailed the counties with PowerBI dashboards to request that they enable data export, but I'm not holding my breath. @collincr, if you feel really stuck, maybe you can switch to knocking out the lower hanging fruit (counties with scrapable html and arcgis dashboards). Perhaps when we have a presentable set of visualizations for some of the counties, we can approach the Power BI counties and persuade them to work with us on getting included in the visualizations.

collincr commented 4 years ago

Thanks @elaguerta, happy to work on one of the other counties for the time being.