openwichita / public-meetings

A service to show all upcoming available public meetings for the City of Wichita.
1 stars 4 forks source link

Email subscription scraper #29

Open Mearnest opened 7 years ago

Mearnest commented 7 years ago

A really super awesome feature would be a scraper that generates meeting events from meeting information in emails. These emails would come from email subscriptions to different board meetings and what not.

Unfortunately, the information is likely to be inside a word or pdf document.

The motivation for this is to be able to keep the website up to date.

Mearnest commented 7 years ago

This can be in any programming language (even the ones below), it just needs to feed the postgres database Elixir reads from.

(insert-into postgres (scrape (email-attach word)) (scrape (email-attach pdf)))

Even This image Haskell

aaronarduino commented 7 years ago

I'd be interested in making this. Would we use a email service like mailgun?

aaronarduino commented 7 years ago

Or maybe we could use something like https://context.io/docs/lite?

Mearnest commented 7 years ago

Whatever works best. Keep in mind that the resulting data has to be inserted into the meeting_types, meeting_dates, and meeting_extras tables in postgres, unless generating json is easier, or providing Elixir with an API.

Mearnest commented 7 years ago

You pretty much have total freedom to use any tech here. It just needs to be run on the same server, unless Elixir is talking to an api running somewhere else.

aaronarduino commented 7 years ago

I've been working on this lately, see link for progress: https://github.com/aaronarduino/public-meetings/tree/proto-email-scraper

Mearnest commented 7 years ago

Looking good so far!