subugoe / hoad

Deprecated: Please check https://github.com/subugoe/hoaddash
https://github.com/subugoe/hoaddash
GNU Affero General Public License v3.0
15 stars 4 forks source link

scrape esac data #244

Open maxheld83 opened 4 years ago

maxheld83 commented 4 years ago

the comprehensive data in ESAC_Transformative_Agreement_Übersicht_der_Verträge.xlsx is so far entered by hand from the esac website.

Perhaps there might be away to scrape this off the website programmatically and/or ask esac for the data in structured form.

Not sure how central this is to our mission though.

maxheld83 commented 3 years ago

also opens up #251 and makes #240 much easier

maxheld83 commented 3 years ago

I think it'd be really great to get the ESAC registry data in a programmatic way, ideally without scraping, since the data surely must exist in some database. This would open a bunch of interesting applications for us (see esac label).

@njahn82 @Henrieke72:

maxheld83 commented 3 years ago

and @njahn82 can you comment how strategically important the ESAC registry data is for our project?

I really want to leverage the work that @Henrieke72 did with it already, and it seems to me the opportunities to mash up the ESAC data with the rest of hoad could be quite interesting #251, but I might not have enough context.

Considering that the data is already mostly structured (and even tidy), properly cleaning and exposing it shouldn't be too much work, maybe a day or two. Depending on what ESAC wants to do with their data, we can also wrap it up in a small R package that's separate from hoad, so more people can use it.

maxheld83 commented 3 years ago

so this will be scraped in a separate package

Henrieke72 commented 3 years ago

@maxheld83 Unfortunately, there is only the HTML version of the data, this is why I had to copy and paste it into an Excel sheet. As the registry data are very dynamic, maybe there is a way to automatically update the Excel file with the new data?

maxheld83 commented 3 years ago

Thanks @Henrieke72! I'll do that; I'll scrape the data off the website and then offer an excel export.