hpi-schul-cloud / schulcloud-content-crawler

Service to gather content for the Schul-Cloud from various education sites.
GNU Affero General Public License v3.0
0 stars 0 forks source link

master: Build Status dev: Build Status codecov: codecov

schulcloud-content-crawler

Service to gather content from various education sites.

Getting Started

  1. Make sure you have NodeJS and npm installed.
  2. Install your dependencies

    npm install
  3. Start the app

    npm start

    The app is available at http://localhost:8091/.

API

/fetch                                            # fetch all client content resources
/fetch?exclude=serlo                              # exclude serlo from fetching
/fetch?exclude=antares&exclude=khanacademy        # exclude antares and khanacademy from fetching

The JSON response of the API call is logged to the ./fetch.log file.

Clients

This repository contains the clients for all external content provider that are used for the content search of the Schul-Cloud. Each client must provide a method called getAll(). A client should create an array of content objects as defined in the content model and described below. In the end, each client must return a promise.

Attributes

A content object should contain as much fields as possible from the following list, although only originId, title, url, and restrictions are required.

Model Extension

Only necessary in rare cases. Please double check if this is inevitable!

When adding new attributes to the content model, the following files need to be adapted:

  1. MongoDB ccontent model (models/contents.js)
  2. Model documentation in this readme (README.md)
  3. Follow the instructions here

Sample Client

A client has to parse its contents to learning objects like in the following example in the Serlo client:

const contentModel = require('./../models/contents');

// ...

function parseLearningObjects(response, contentType) {
    var content = JSON.parse(response);
    return content.map(function (serialization) {
        var subjectsAndTargetGroups = parseCategories(serialization.categories);
        var data = {
            originId: serialization.guid,
            title: serialization.title,
            url: urljoin(BASE_URL, serialization.link),
            license: ['https://creativecommons.org/licenses/by-sa/4.0/'],
            language: 'de-de',
            description: serialization.description,
            contentType: CONTENT_TYPE_STANDARD_NAMES[contentType],
            subjects: subjectsAndTargetGroups.subjects,
            targetGroups: subjectsAndTargetGroups.targetGroups,
            tags: Object.keys(serialization.keywords).map(function (x) {
                return serialization.keywords[x];
            }),
            restrictions: null,
            lastModified: moment.tz(serialization.lastModified.date, serialization.lastModified.timezone).toDate()
        };

        return contentModel.getModelObject(data);
    });
}

Contribution

Anyone planning on adding another client should try to follow the Serlo client as an example of how a client should look like (clients/serlo).

License

Copyright (c) 2016

Licensed under the MIT license.