The dataset contains representations of Youtube videos:

3000 videos from Stanford
3000 videos from MIT
ML lecture series from Caltech
CHI videos to be added

Currently, these representations are loaded from local CSV/JSON files and stored in RAM, which doesn't scale well. They should be stored in a database.

X5learn needs to store some additional properties which the current X5GON API doesn't provide yet. Thumbnail images and wikichunks are just two examples of the set of required properties, which will likely evolve during rapid design iterations that will involve real users and a broad variety of media types.

The x5learn database is the best place to store this information because it allows us to experiment without stepping on anyone's toes. Once we have identified the "one schema to rule them all", we should then decide whether to keep the extra data in the dashboard database or propose an appropriate extension to the X5GON platform.

Does this make sense?

For now, I propose adding an "oers" table to the x5gon database, including the following columns:

URL (string)
content (JSON): All the data that the frontend needs, including title, thumbnail, wikichunks etc.
origin (string): e.g. SIGCHI_YOUTUBE, STANFORD_YOUTUBE, UNESCO_WHS, or X5GON. This field will be useful during user studies where we want to include or exclude specific datasets.
material_id (optional): While x5learn likes to address OERs by URL, the X5GON API uses its own material_id for retrieving OERs, so this field provides that important link. Making it optional allows x5learn to use OERs that come through alternative channels (origins) other than the normal X5GON pipeline.

Tasks

[x] Add a table "oers" to the x5learn database
[x] Add a script to seed the db with data from CSV and JSON files (~/x5learn_data)
[x] Remove the code that loads the CSV data into RAM at @app.before_first_request
[x] Add a hook to create a row for every new OER coming from X5GON search

sahanbull / x5learn

Move the Youtube dataset to the database #73

Tasks