snarfed / granary

💬 The social web translator
https://granary.io
Creative Commons Zero v1.0 Universal
438 stars 57 forks source link
activitypub activitystreams atom atproto bluesky converter fediverse feed html indieweb json jsonfeed mastodon microformats2 rest-api rss

Granary granary Circle CI Coverage Status

The social web translator. Fetches and converts data between social networks, HTML and JSON with microformats2, ActivityStreams/ActivityPub, Atom, JSON Feed, and more.

About

Granary is a library and REST API that fetches and converts between a wide variety of social data sources and formats:

Free yourself from silo API chaff and expose the sweet social data foodstuff inside in standard formats and protocols!

Here's how to get started:

License: This project is placed in the public domain. You may also use it under the CC0 License.

Using

The library and REST API are both based on the OpenSocial Activity Streams service. Let's start with an example. This code using the library:

from granary import twitter
...
tw = twitter.Twitter(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
tw.get_activities(group_id='@friends')

is equivalent to this HTTP GET request:

https://granary.io/twitter/@me/@friends/@app/
  ?access_token_key=ACCESS_TOKEN_KEY&access_token_secret=ACCESS_TOKEN_SECRET

They return the authenticated user's Twitter stream, ie tweets from the people they follow. Here's the JSON output:

{
  "itemsPerPage": 10,
  "startIndex": 0,
  "totalResults": 12,
  "items": [{
      "verb": "post",
      "id": "tag:twitter.com,2013:374272979578150912",
      "url": "http://twitter.com/evanpro/status/374272979578150912",
      "content": "Getting stuff for barbecue tomorrow. No ribs left! Got some nice tenderloin though. (@ Metro Plus Famille Lemay) http://t.co/b2PLgiLJwP",
      "actor": {
      "username": "evanpro",
        "displayName": "Evan Prodromou",
        "description": "Prospector.",
        "url": "http://twitter.com/evanpro",
      },
      "object": {
        "tags": [{
            "url": "http://4sq.com/1cw5vf6",
            "startIndex": 113,
            "length": 22,
            "objectType": "article"
          }, "..."],
      },
    }, "..."]
  "..."
}

The request parameters are the same for both, all optional: USER_ID is a source-specific id or @me for the authenticated user. GROUP_ID may be @all, @friends (currently identical to @all), @self, @search, or @blocks; APP_ID is currently ignored; best practice is to use @app as a placeholder.

Paging is supported via the startIndex and count parameters. They're self explanatory, and described in detail in the OpenSearch spec and OpenSocial spec.

When using the GROUP_ID @search (for platforms that support it — currently Twitter and Instagram), provide a search string via the q parameter. The API is loosely based on the OpenSearch spec, the OpenSocial Core Container spec, and the OpenSocial Core Gadget spec.

Output data is JSON Activity Streams 1.0 objects wrapped in the OpenSocial envelope, which puts the activities in the top-level items field as a list and adds the itemsPerPage, totalCount, etc. fields.

Most Facebook requests and all Twitter, Instagram, and Flickr requests will need OAuth access tokens. If you're using Python on Google App Engine, oauth-dropins is an easy way to add OAuth client flows for these sites. Otherwise, here are the sites' authentication docs: Facebook, Flickr, Instagram, Twitter.

If you get an access token and pass it along, it will be used to sign and authorize the underlying requests to the sources providers. See the demos on the REST API endpoints above for examples.

Using the REST API

The endpoints above all serve the OpenSocial Activity Streams REST API. Request paths are of the form:

/USER_ID/GROUP_ID/APP_ID/ACTIVITY_ID?startIndex=...&count=...&format=FORMAT&access_token=...

All query parameters are optional. FORMAT may be as1 (the default), as2, atom, html, jsonfeed, mf2-json, rss, or xml (the default). atom supports a boolean reader query parameter for toggling rendering appropriate to feed readers, e.g. location is rendered in content when reader=true (the default). The rest of the path elements and query params are described above.

Errors are returned with the appropriate HTTP response code, e.g. 403 for Unauthorized, with details in the response body.

By default, responses are cached and reused for 10m without re-fetching the source data. (Instagram responses are cached for 60m.) You can prevent this by adding the cache=false query parameter to your request.

Include the shares=false query parameter to omit shares, eg Twitter retweets, from the results.

To use the REST API in an existing ActivityStreams/ActivityPub client, you'll need to hard-code exceptions for the domains you want to use e.g. facebook.com, and redirect HTTP requests to the corresponding endpoint above.

Facebook and Instagram are disabled in the REST API entirely, sadly.

Using the library

See the example above for a quick start guide.

Clone or download this repo into a directory named granary. Each source works the same way. Import the module for the source you want to use, then instantiate its class by passing the HTTP handler object. The handler should have a request attribute for the current HTTP request.

The useful methods are get_activities() and get_actor(), which returns the current authenticated user (if any). See the full reference docs for details. All return values are Python dicts of decoded ActivityStreams 1 JSON.

The microformats2.*_to_html() functions are also useful for rendering ActivityStreams 1 objects as nicely formatted HTML.

Troubleshooting/FAQ

Check out the oauth-dropins Troubleshooting/FAQ section. It's pretty comprehensive and applies to this project too.

Future work

We'd love to add more sites! Off the top of my head, YouTube, Tumblr, WordPress.com, Sina Weibo, Qzone, and RenRen would be good candidates. If you're looking to get started, implementing a new site is a good place to start. It's pretty self contained and the existing sites are good examples to follow, but it's a decent amount of work, so you'll be familiar with the whole project by the end.

Development

Pull requests are welcome! Feel free to ping me in #indieweb-dev with any questions.

First, fork and clone this repo. Then, install the Google Cloud SDK and run gcloud components install cloud-firestore-emulator to install the Firestore emulator. Once you have them, set up your environment by running these commands in the repo root directory:

gcloud config set project granary-demo
python3 -m venv local
source local/bin/activate
pip install -r requirements.txt
# needed to serve static files locally
ln -s local/lib/python3*/site-packages/oauth_dropins/static oauth_dropins_static

Now, run the tests to check that everything is set up ok:

gcloud emulators firestore start --host-port=:8089 --database-mode=datastore-mode < /dev/null >& /dev/null &
python3 -m unittest discover

Finally, run the web app locally with flask run:

GAE_ENV=localdev FLASK_ENV=development flask run -p 8080

Open localhost:8080 and you should see the granary home page!

If you want to work on oauth-dropins at the same time, install it in editable mode with pip install -e <path to oauth-dropins repo>. You'll also need to update the oauth_dropins_static symlink, which is needed for serving static file handlers locally: ln -sf <path-to-oauth-dropins-repo>/oauth_dropins/static oauth_dropins_static.

To deploy to production:

gcloud -q beta app deploy --no-cache granary-demo *.yaml

The docs are built with Sphinx, including apidoc, autodoc, and napoleon. Configuration is in docs/conf.py To build them, first install Sphinx with pip install sphinx. (You may want to do this outside your virtualenv; if so, you'll need to reconfigure it to see system packages with virtualenv --system-site-packages local.) Then, run docs/build.sh.

Release instructions

Here's how to package, test, and ship a new release. (Note that this is largely duplicated in the oauth-dropins readme too.)

  1. Run the unit tests.
    source local/bin/activate.csh
    CLOUDSDK_CORE_PROJECT=granary-demo gcloud emulators firestore start --host-port=:8089 --database-mode=datastore-mode < /dev/null >& /dev/null &
    sleep 5
    python -m unittest discover
    kill %1
    deactivate
  2. Bump the version number in setup.py and docs/conf.py. git grep the old version number to make sure it only appears in the changelog. Change the current changelog entry in README.md for this new version from unreleased to the current date.
  3. Bump the oauth-dropins version specifier in setup.py to the most recent version.
  4. Build the docs. If you added any new modules, add them to the appropriate file(s) in docs/source/. Then run ./docs/build.sh. Check that the generated HTML looks fine by opening docs/_build/html/index.html and looking around.
  5. git commit -am 'release vX.Y'
  6. Upload to test.pypi.org for testing.
    python setup.py clean build sdist
    setenv ver X.Y
    source local/bin/activate.csh
    twine upload -r pypitest dist/granary-$ver.tar.gz
  7. Install from test.pypi.org.
    cd /tmp
    python -m venv local
    source local/bin/activate.csh
    pip uninstall granary # make sure we force Pip to use the uploaded version
    pip install --upgrade pip
    pip install mf2py==1.1.2
    pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple granary==$ver
    deactivate
  8. Smoke test that the code trivially loads and runs.

    source local/bin/activate.csh
    python
    # run test code below
    deactivate

    Test code to paste into the interpreter:

    import json
    from granary import github
    github.__file__  # check that it's in the virtualenv
    
    g = github.GitHub('XXX')  # insert a GitHub personal OAuth access token
    a = g.get_activities()
    print(json.dumps(a, indent=2))
    
    from granary import atom
    print(atom.activities_to_atom(a, {}))
  9. Tag the release in git. In the tag message editor, delete the generated comments at bottom, leave the first line blank (to omit the release "title" in github), put ### Notable changes on the second line, then copy and paste this version's changelog contents below it.
    git tag -a v$ver --cleanup=verbatim
    git push && git push --tags
  10. Click here to draft a new release on GitHub. Enter vX.Y in the Tag version box. Leave Release title empty. Copy ### Notable changes and the changelog contents into the description text box.
  11. Upload to pypi.org!
    twine upload dist/granary-$ver.tar.gz
  12. Build the docs on Read the Docs: first choose latest in the drop-down, then click Build Version.
  13. On the Versions page, check that the new version is active, If it's not, activate it in the Activate a Version section.

Related work

Apache Streams is a similar project that translates between storage systems and database as well as social schemas. It's a Java library, and its design is heavily structured. Here's the list of formats it supports. It's mainly used by People Pattern.

Gnip similarly converts social network data to ActivityStreams and supports many more source networks. Unfortunately, it's commercial, there's no free trial or self-serve signup, and plans start at $500.

DataSift looks like broadly the same thing, except they offer self-serve, pay as you go billing, and they use their own proprietary output format instead of ActivityStreams. They're also aimed more at data mining as opposed to individual user access.

Cliqset's FeedProxy used to do this kind of format translation, but unfortunately it and Cliqset died.

Facebook used to officially support ActivityStreams, but that's also dead.

There are a number of products that download your social network data, normalize it, and let you query and visualize it. SocialSafe is one, although the SSL certificate is currently out of date. ThinkUp was an open source product, but shuttered on 18 July 2016. There's also the lifelogging/lifestream aggregator vein of projects that pull data from multiple source sites. Storytlr is a good example. It doesn't include Facebook, or Instagram, but does include a number of smaller source sites. There are lots of others, e.g. the Lifestream WordPress plugin. Unfortunately, these are generally aimed at end users, not developers, and don't usually expose libraries or REST APIs.

On the open source side, there are many related projects. php-mf2-shim adds microformats2 to Facebook and Twitter's raw HTML. sockethub is a similar "polyglot" approach, but more focused on writing than reading.

Changelog

7.1 - unreleased

7.0 - 2024-06-24

Breaking changes:

Non-breaking changes:

6.2 - 2024-03-15

6.1 - 2023-09-16

Highlights: Nostr, Bluesky get_activities, lots of improvements in as2 and microformats2, and more!

REST API breaking changes:

Twitter is dead, at least in the REST API.

Non-breaking changes:

6.0 - 2023-03-22

Breaking changes:

Non-breaking changes:

5.0 - 2022-12-03

Breaking changes:

Non-breaking changes:

4.0 - 2022-03-23

Breaking changes:

Non-breaking changes:

3.2 - 2021-09-15

3.1 - 2021-04-03

3.0 - 2020-04-08

Breaking changes:

Non-breaking changes:

2.2 - 2019-11-02

2.1 - 2019-09-04

2.0 - 2019-03-01

Breaking change: drop Google+ since it shuts down in March. Notably, this removes the googleplus module.

1.15 - 2019-02-28

1.14 - 2018-11-12

Add delete(). Currently includes Twitter and Flickr support.

1.13 - 2018-08-08

1.12 - 2018-03-24

This release is intentionally small and limited in scope to contain any impact of the Python 3 migration. It should be a noop for existing Python 2 users, and we've tested thoroughly, but I'm sure there are still bugs. Please file issues if you notice anything broken!

1.11 - 2018-03-09

1.10 - 2017-12-10

1.9 - 2017-10-24

1.8 - 2017-08-29

1.7 - 2017-02-27

1.6 - 2016-11-26

1.5 - 2016-08-25

1.4.1 - 2016-06-27

1.4.0 - 2016-06-27

1.3.1 - 2016-04-07

1.3.0 - 2016-04-06

1.2.0 - 2016-01-11

1.1.0 - 2015-09-06

1.0.1 - 2015-07-11

1.0 - 2015-07-10