keboola / ex-facebook-graph-api

facebook graph api extractor(insights and fb-ads intended)
MIT License
39 stars 8 forks source link

Build Status

Docker Repository on Quay

Sample Nested Queries

the whole fb feed - all page posts, its likes, comments, likes of the comments, subcomments, likes of the subcomments
{
  "id": 1,
  "name": "scrape",
  "type": "nested-query",
  "query": {
    "path": "feed",
    "fields": "caption,message,created_time,type,description,likes{link,name,username},comments{message,created_time,from,likes{link,name,username},comments{message,created_time,from,likes{link,name,username}}}",
    "ids": "<page_id>"
}
extract page metrics page_impressions, page_fans and page_engaged_users for last 5 days
{
  "id": 2,
  "name": "page_insights",
  "type": "nested-query",
  "query": {
    "path": "",
    "fields": "insights.since(5 days ago).metric(page_impressions,page_fans,page_engaged_users)",
    "ids": "<page_id>"
  }
}

According to the last facebook graph api all insights metrics that are needed to extract must be explicitely listed in the query, i.e., there is no general get-all-metrics data query type of call.

extract posts metrics page_posts_impressions and post_impressions for all posts:

Note day we are using since(now) specification of insights time to get the most recent values, otherwise it may paginate over small periods of time and consume lots of request to facebook api.

{
  "id": 3,
  "name": "posts_insights",
  "type": "nested-query",
  "query": {
    "path": "feed",
    "fields": "insights.since(now).metric(page_posts_impressions,post_impressions)",
    "ids": "<page_id>"
  }
}

You can try the examples above by calling Facebook Graph api directly in a http client(e.g. Postman) as follows: GET https://graph.facebook.com/<api_version>/<path>?fields=<fields>&ids=<ids>&access_token=<access_token> You can get access token in Graph Api Explorer.

Configuration

Facebook Graph API

This extractor extracts data from facebook graph api: https://developers.facebook.com/docs/graph-api/ . You can try it live here: https://developers.facebook.com/tools/explorer

Nested Query

Lets say that in Facebook Graph Api every endpoint represents a node in a graph. Example of a node could be /me - ie user info, me/posts - ie posts of the current user. To get data from a particullar endpoint one can make typical REST api call GET me/posts or make a nested api call that basically allows to extract the whole subtree of a node. for example GET me?fields=posts,comments,likes extracts all posts comments and likes of an id(in this case it is me - current user). For more info see Making Nested Queries in https://developers.facebook.com/docs/graph-api/using-graph-api#reading

Configuring nested query

In configuration under parameters there is an array of queries(see sample configuration). Each query besides obvious properties such as id, name, type(currently only nested-query type), disabled also contains object query with the following properties:

Async Insights Query

Sometimes when extracting ads insights via nested query returns "Please reduce the amount of data you're asking for". In such case it is better to use async-insights-query, that extracts ads insights asynchronously and should deal with asking bigger amount of data. The query specification is similar nested query, however it only contains 2

Sample async insights query

{
  "id": 3,
  "name": "ads_async_insights",
  "type": "async-insights-query",
  "query": {
    "parameters": "fields=ad_id,actions&level=ad&action_breakdowns=action_type&date_preset=last_month&time_increment=1",
    "ids": "<page_id>"
  }
}

Sample configuration:

Note that you can specify facebook api version via api-version parameter. Default is v5.0.

{
  "storage": {},
  "parameters": {
    "accounts": {
      "<pageId1>": {
        "id": "<pagId1>",
        "name": "my fancy page",
        "category": "entertainment"
      },
      "<pageId2>": {
        "id": "<pageId2>",
        "name": "keboola",
        "category": "software"
      }
    },
    "api-version": "v5.0",
    "queries": [
      {
        "id": 1,
        "name": "qname",
        "type": "nested-query",
        "query": {
          "path": "feed",
          "fields": "message,story,likes,comments{from}",
          "ids": "<pagId1>,<pageId2>"
        }
      },
      {
        "id": 3,
        "name": "ads_async_insights",
        "type": "async-insights-query",
        "query": {
          "parameters": "fields=ad_id,actions&level=ad&action_breakdowns=action_type&date_preset=last_month&time_increment=1",
          "ids": "<page_id>"
        }
      }
    ]
  },
  "authorization": {
    "oauth_api": {
      "id": "{OAUTH_API_ID}"
      }
    }
  }

Result tables description

For each query extractor generates a number of tables prefixed with query name. Each table represents one type of node so typically tables would be queryname_post, queryname_likes, queryname_comments queryname_insights. Same nested structure type will be in the same table. So for example comments and subcomments will be in the same table comments. Every table has different columns but the following columns will always be the same:

Authorization

Register component api to oauth-bundle-v2 by calling POST to https://syrup.keboola.com/oauth-v2/manage with manage-token, storage-api token in the header and body:

{
  "component_id": "keboola.ex-facebook-insights-v2",
  "friendly_name": "Facebook Insights",
  "app_key": "xxx",
  "app_secret": "xxx",
  "oauth_version": "facebook",
  "permissions": "manage_pages,public_profile,read_insights,pages_show_list",
  "graph_api_version": "v2.8"

}

more info about authorization registration here: https://github.com/keboola/oauth-v2-bundle

Development

The app is written in Clojure(1.8), evaluated and build via Boot-clj which requires Java Development Kit. To try the app locally check target commands in the Makefile For example to build and run the app locally type from the repo root:

make build-jar run-jar

License

MIT licensed, see LICENSE file.