jhudsl / OTTR_Template

OTTR for making courses! This is a template repo that helps people write 1 course but publish it in three places
https://www.ottrproject.org/
Creative Commons Attribution 4.0 International
15 stars 12 forks source link

Collect as much course content and learning objectives automagically as possible #331

Closed cansavvy closed 2 years ago

cansavvy commented 2 years ago

Describe the your scope of your content idea

To cut down on manual labor, I'm going to try to scrape as much course info from the jhudsl and DataTrail organizations as I can so we can add them to the library googlsheet to start off: https://docs.google.com/spreadsheets/d/13TvG95v71a0QsCcaZC7zB4GbtF67Q6Bb-Dc_GLOScHY/edit#gid=0

Ideas on how to get there

This github scraper is something to look into: https://github.com/sbaack/github-scraper

cansavvy commented 2 years ago

This is probably a better scraper: https://github.com/alirezamika/autoscraper

cansavvy commented 2 years ago

Here's the idea:

  1. Find repos in jhudsl with api
  2. narrow down to those with GitHub pages
  3. narrow down to GitHub pages rendered with the template
  4. collect _bookdown.yml from these repos
  5. Read the Rmds listed
  6. Collect all h1 and h2 (maybe h3) headers
  7. Search for "learning objective"
  8. Download that slide's text
  9. Populate the Google sheet with this info
cansavvy commented 2 years ago

Check it out! (NA means there's not a bookdown associated with the course as far as the GitHub API is concerned) https://docs.google.com/spreadsheets/d/1klDpaQcGjYUa5Xro-DxqNTJq7Did-Ujn7IGgm7W6kJ8/edit#gid=65359487

This accomplishes Steps 1 - 6 so far.

cansavvy commented 2 years ago

The rest of the issues for this will be tracked on https://github.com/jhudsl/gitHelpeR