learningequality / sushi-chef-internet-archive-universal-library

'Internet Archive - Universal Library' chef
MIT License
0 stars 0 forks source link

'Internet Archive - Universal Library' chef

Kolibri is an open source educational platform to distribute content to areas with little or no internet connectivity. Educational content is created and edited on Kolibri Studio, which is a platform for organizing content to import from the Kolibri applications. The purpose of this project is to create a chef, or a program that scrapes a content source and puts it into a format that can be imported into Kolibri Studio.

The Universal Library Project, sometimes called the Million Books Project, was pioneered by Jaime Carbonell, Raj Reddy, Michael Shamos, Gloriana St Clair, and Robert Thibadeau of Carnegie Mellon University. The Governments of India, China, and Egypt are helping fund this effort through scanning facilities and personnel. The Internet Archive has contributed 100k books from the Kansas City Public Library along with servers to India. The Indian government scanned the appropriate books. The Internet Archive has performed automated conversion of these scans into this collection.

This project was initialized from a template: https://github.com/learningequality/cookiecutter-chef/

Installation

Usage

TODO: Explain how to run the 'Internet Archive - Universal Library' chef

  export SOMEVAR=someval
  ./script.py -v --option2 --kwoard="val"

Description

A sushi chef script is responsible for importing content into Kolibri Studio from the Internet Archive Universal Library. The Rice Cooker library provides all the necessary methods for uploading the channel content to Kolibri Studio, as well as helper functions and utilities.

A sushi chef script has been started for you in sushichef.py.

Sushi chef docs can be found here.

_For more sushi chef examples, see examples/openstax_sushichef.py (json) and examples/wikipedia_sushichef.py (html) and also the examples/ dir inside the ricecooker repo._


Rubric

Please make sure your final chef matches the following standards.

General Standards

  1. Does the code work (no infinite loops, exceptions thrown, etc.)?
  2. Are the source_ids determined consistently (based on foreign database identifiers or permanent url paths)?
  3. Is there documentation on how to run the script (include command line parameters to use)?

Coding Standards

  1. Are there no obvious runtime or memory inefficiencies in the code?
  2. Are the functions succinct?
  3. Are clarifying comments provided where needed?
  4. Are the git commits easy to understand?
  5. Is there no unnecessary nested if or for loops?
  6. Are variables named descriptively (e.g. path vs p)?

Python Standards

  1. Is the code compatible with Python 3?
  2. Does the code use common standard library functions where needed?
  3. Does the code use common python idioms where needed (with/open, try/except, etc.)?