openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Refactor code #51

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

This basically does the following refactors -

Please note that utils.py is not fully refactored in this as adding other features/ fixing issues would involve changing that file largely. Also. currently the language is set to "eng" for the ZIM. I'll change that while fixing #49

This shall close the following - #45, #46, #43

The argument structure is now modified as follows -

usage: openedx2zim [-h] --course-url COURSE_URL --email EMAIL [--password PASSWORD] --name NAME
                   [--title TITLE] [--description DESCRIPTION] [--creator CREATOR]
                   [--publisher PUBLISHER] [--tags TAGS] [--convert-in-webm]
                   [--ignore-missing-xblocks] [--lang LANG] [--add-wiki] [--add-forum]
                   [--output OUTPUT_DIR] [--tmp-dir TMP_DIR] [--zim-file FNAME]
                   [--no-fulltext-index] [--no-zim] [--keep] [--debug] [--version]

Scraper to create ZIM files MOOCs on openedx instances

optional arguments:
  -h, --help            show this help message and exit
  --course-url COURSE_URL
                        URL of the course you wnat to scrape
  --email EMAIL         Your registered e-mail ID on the platform. Used for authentication
  --password PASSWORD   The password to your registered account on the platform. If you don't
                        provide one here, you'll be asked for it later
  --name NAME           ZIM name. Used as identifier and filename (date will be appended)
  --title TITLE         Custom title for your ZIM. Based on MOOC otherwise.
  --description DESCRIPTION
                        Custom description for your ZIM. Based on MOOC otherwise.
  --creator CREATOR     Name of content creator
  --publisher PUBLISHER
                        Custom publisher name (ZIM metadata)
  --tags TAGS           List of comma-separated Tags for the ZIM file. category:openedx, openedx,
                        and _videos:yes (if present) added automatically
  --convert-in-webm     Re-encode videos to WebM
  --ignore-missing-xblocks
                        Ignore unsupported content (xblock)
  --lang LANG           Default language of the interface and the ZIM content (ISO-639-1 codes)
  --add-wiki            Add wiki (if available) to the ZIM
  --add-forum           Add forum (if available) to the ZIM
  --output OUTPUT_DIR   Output folder for ZIM file
  --tmp-dir TMP_DIR     Path to create temp folder in. Used for building ZIM file. Receives all
                        data
  --zim-file FNAME      ZIM file name (based on --name if not provided)
  --no-fulltext-index   Don't index the scraped content in the ZIM
  --no-zim              Don't produce a ZIM file, create build folder only.
  --keep                Don't remove build folder on start (for debug/devel)
  --debug               Enable verbose output
  --version             Display scraper version and exit

Here's a test ZIM that I created - test_2020-07.zip (Please ignore the random strings that I put for title and description)

I've matched this with what the original scraper without any changes did and they seem identical. There is a scope of improvements but we're not looking at the frontend part now. Maybe sometime later.