hawkrives / gobbldygook-area-data

Major and concentration data for St. Olaf College (and Gobbldygook)
https://hawkrives.github.io/gobbldygook-area-data
1 stars 1 forks source link

Write tooling to fetch, extract, and diff two catalogs #114

Open hawkrives opened 5 years ago

hawkrives commented 5 years ago

Luckily, we're on leepfrog now, so the HTML is pretty consistent.

Plus, if you extract the body content and convert to Markdown, I think it'll become very nicely diffable.

So then you just need something to walk the sidebar menu, fetch the degrees/majors overview page, and then fetch each individual page, and output just the relevant bit of html, and convert it to markdown for diffing.

You can even ignore the link hrefs, since those are immaterial to diffing for requirement content, which is what we're after.