oyiptong / up-headliner

Headliner is a JSON API that returns personalized content obtained from providers
Mozilla Public License 2.0
0 stars 2 forks source link

User Personalization - Headliner

Build Status Coverage Status

Headliner is a service that serves personalized content obtained from various sources via a JSON API. It is meant to both serve as a demo to the forthcoming Firefox UP feature and as an example application service.

It consists of two parts:

Requirements

External programs required for headliner to work:

The rest of the dependencies will be installed by an included setup process.

This program has only been run on Mac OS X and Linux.

Development Setup

You can setup a development environment by running the provided setup script:

$ ./setup-project.sh

Before you can run the HTTP server, you will have to activate the environment, by running the command:

$ . ./up-headliner-env/bin/activate

You can then read about options about running the server by typing:

$ ./scripts/up-headliner-server --help

Or run the above line without the argument to start an http server with the default configuration.

To build the project, just run:

$ fab build

This will run the tests and some additional checks, like flake8, to help ensure the internal quality of the project. Or you can run each build stage separately:

$ fab test # to run the automated tests
$ fab flake # to run flake8 with the project options (see flake8.conf)
$ fab package # to package the project files

You can also provide additional arguments to each separate dev task:

$ fab test:config=my-nose.cfg,debug_errors=yes,debug_failures=yes # Use a different config, drop in debug shell on errors or failures
$ fab flake:config=my-flake.cfg # Use a different config
$ fab package:clean=false # Don't remove build directory before packaging

Configuration

Configuration is read in a number of ways:

  1. by editing the file at up/headliner/settings.py
  2. as a json file located at /etc/headliner/webserver.json
  3. as a json file specified on the command-line

The configuration is loaded with the first item in the list with the least priority and the last item the most priority.

New York Times Most Popular API

The included content source in this package is obtained from the New York Times Most Popular API.

To obtain content from this source, you will need to provide your API key as configuration. Please refer to the section above.

Once you have setup the environment, entered your API key, and have redis up and running, you can populate the data store with articles from nytimes by running the script provided for that purpose:

$ ./scripts/populate_nytimes_mostpopular.py

Note: You can specify the --purge option to clear the existing database.

Once the data is populated, the data will be available for consumption via the HTTP webservice. Following is a description of those API endpoints. The code being described can be found at https://github.com/oyiptong/up-headliner/blob/master/up/headliner/content/nytimes/urls.py.

Interest Index

http://127.0.0.1:4355/nytimes/mostpopular.json

This lists the interests available and provides a numeric quantity that tells how many articles fall into these interests.

Example output:

{
  "d": {
    "Android": 10,
    "Apple": 10,
    "Arts": 3,
    "Autos": 7,
    "Baseball": 5,
    "Basketball": 2,
    "Boxing": 3,
    "Design": 14,
    "Football": 6,
    "Health-Men": 25,
    "Health-Women": 25,
    "Ideas": 11,
    "Movies": 21,
    "Parenting": 6,
    "Programming": 30,
    "Science": 26,
    "Soccer": 2,
    "Sports": 34,
    "Technology": 30,
    "Travel": 23,
    "Video-Games": 1
  }
}

Article Listing service

http://127.0.0.1:4355/nytimes/mostpopular/<interest_name>.json

Example output:

{
  "d": [
    {
      "media": [
        {
          "caption": "The 2014 Mazda 3 flaunts Euro-style curves and intriguing shapes.",
          "copyright": "Mazda North America",
          "media-metadata": [
            {
              "format": "Standard Thumbnail",
              "height": 75,
              "url": "http://graphics8.nytimes.com/images/2013/12/01/automobiles/SUB-WHEEL1/SUB-WHEEL1-thumbStandard.jpg",
              "width": 75
            },
            {
              "format": "thumbLarge",
              "height": 150,
              "url": "http://graphics8.nytimes.com/images/2013/12/01/automobiles/SUB-WHEEL1/SUB-WHEEL1-thumbLarge.jpg",
              "width": 150
            },
            {
              "format": "mediumThreeByTwo210",
              "height": 140,
              "url": "http://graphics8.nytimes.com/images/2013/12/01/automobiles/SUB-WHEEL1/SUB-WHEEL1-mediumThreeByTwo210.jpg",
              "width": 210
            }
          ],
          "subtype": "photo",
          "type": "image"
        }
      ],
      "title": "Performer Available for Private Parties",
      "url": "http://www.nytimes.com/2013/12/01/automobiles/autoreviews/performer-available-for-private-parties.html?src=moz-up"
    }
  ],
  "num_articles": 1
}

Personalization API

http://127.0.0.1:4355/nytimes/mostpopular/personalize

This will return a list of articles based on a query, which consists of an object describing interest prefereces.

Here is an example query:

{"Arts":0.9,"Autos":0.5,"Design":0.3}

You can find an example API call made using curl at https://github.com/oyiptong/up-headliner/blob/master/scripts/example_request.sh

The scores are between 0 and 1 and the resulting articles are chosen in proportion to other interests.

With a limit of 20, the API will attempt to return a list of articles as follows:

import math
article_limit = 20
total_weights = 0.9 + 0.5 + 0.3
num_arts_articles = math.ceil(0.9 / total_weights * article_limit)

Which makes the number of Arts articles 10, Autos articles 5 and Design articles 3.

The API returns results in a best-effort manner. If there are less than 10 Arts articles available, the API will return whatever it has.

The output contains articles in order of importance from the interests they belong in.

The output looks the same as the article listing API.

Periodic Tasks

Periodic tasks can either be set to run via crontab or via Celery-beat.

If you choose to run via the Celery-based implementation, you will need to run two daemons:

  1. At least one Worker
  2. A scheduler

The scheduling information is set via the configuration files. There are scripts to start both daemons in the scripts directory.

License

All source code here is available under the MPL 2.0 license, unless otherwise indicated.