{
"id": 1,
"name": "scrape",
"type": "nested-query",
"query": {
"path": "feed",
"fields": "caption,message,created_time,type,description,likes{link,name,username},comments{message,created_time,from,likes{link,name,username},comments{message,created_time,from,likes{link,name,username}}}",
"ids": "<page_id>"
}
page_impressions
, page_fans
and page_engaged_users
for last 5 days{
"id": 2,
"name": "page_insights",
"type": "nested-query",
"query": {
"path": "",
"fields": "insights.since(5 days ago).metric(page_impressions,page_fans,page_engaged_users)",
"ids": "<page_id>"
}
}
According to the last facebook graph api all insights metrics that are needed to extract must be explicitely listed in the query, i.e., there is no general get-all-metrics data query type of call.
page_posts_impressions
and post_impressions
for all posts:Note day we are using since(now)
specification of insights time to get the most recent values, otherwise it may paginate over small periods of time and consume lots of request to facebook api.
{
"id": 3,
"name": "posts_insights",
"type": "nested-query",
"query": {
"path": "feed",
"fields": "insights.since(now).metric(page_posts_impressions,post_impressions)",
"ids": "<page_id>"
}
}
You can try the examples above by calling Facebook Graph api directly in a http client(e.g. Postman) as follows:
GET https://graph.facebook.com/<api_version>/<path>?fields=<fields>&ids=<ids>&access_token=<access_token>
You can get access token in Graph Api Explorer.
This extractor extracts data from facebook graph api: https://developers.facebook.com/docs/graph-api/ . You can try it live here: https://developers.facebook.com/tools/explorer
Lets say that in Facebook Graph Api every endpoint represents a node in a graph. Example of a node could be /me - ie user info, me/posts - ie posts of the current user. To get data from a particullar endpoint one can make typical REST api call GET me/posts
or make a nested api call that basically allows to extract the whole subtree of a node. for example GET me?fields=posts,comments,likes
extracts all posts comments and likes of an id(in this case it is me - current user).
For more info see Making Nested Queries in https://developers.facebook.com/docs/graph-api/using-graph-api#reading
In configuration under parameters there is an array of queries
(see sample configuration). Each query besides obvious properties such as id
, name
, type
(currently only nested-query type), disabled
also contains object query
with the following properties:
path
: enpoint url so the absolute url will be like graph.facebook.com/version/path. Typically it is endpoint feed. Can be an empty string if we want to start extracting from the "root" node that is the page itself.fields
: fields parameter of the graph api nested-queryids
: comma separated list of ids(typically page-ids) that will be prepended with path. It is also a parameter of graph api. If empty string than all ids from accounts
object will be used. Can also be completely removed from the query.limit
- size of one page(response). Default is 25, maximum 100. Useful when fb api returns error that the request is "too big" - in such case use smaller limit. This parameter also affects the total number of request made to fb api.since
- relates to the created_time of path parameter i.e., if path is "posts" then it takes all posts with created_time since the specified date in since parameter. If path is empty then it does not have any effect. Can be specified relatively, e.g. 10 days ago.until
- same as since above but specifies date until data with created_time date.
The most important parameter is fields
- tells what is going to extract. so here are few hints:
posts.limit(100).since(2016-12-24){message,likes,comments{comments}}
since
or/and until
that accepts unix timestamp values(in seconds) or date in format yyy-mm-dd or relative values: e.g. all posts posted in last 10 days posts.since(10 days ago){message,likes,comments}
comments{from,message,created_time,likes}
{
"fields": "posts{message,story,created_time,likes,comments{from,message,created_time,comments,likes}}",
"path": "",
"ids": "<some_page_id>"
}
Sometimes when extracting ads insights via nested query returns "Please reduce the amount of data you're asking for". In such case it is better to use async-insights-query
, that extracts ads insights asynchronously and should deal with asking bigger amount of data. The query specification is similar nested query, however it only contains 2
parameters
- URL query string specifiying ads insights parameters as described in https://developers.facebook.com/docs/marketing-api/reference/adgroup/insights/. The parameters are separated by &
, e.g. fields=ad_id,actions&level=ad&action_breakdowns=action_type&date_preset=last_month&time_increment=1
ids
- comma separated list of ids(typically page-ids) that will be prepended with path. It is also a parameter of graph api. If empty string than all ids from accounts
object will be used. Can also be completely removed from the query.{
"id": 3,
"name": "ads_async_insights",
"type": "async-insights-query",
"query": {
"parameters": "fields=ad_id,actions&level=ad&action_breakdowns=action_type&date_preset=last_month&time_increment=1",
"ids": "<page_id>"
}
}
Note that you can specify facebook api version via api-version
parameter. Default is v5.0.
{
"storage": {},
"parameters": {
"accounts": {
"<pageId1>": {
"id": "<pagId1>",
"name": "my fancy page",
"category": "entertainment"
},
"<pageId2>": {
"id": "<pageId2>",
"name": "keboola",
"category": "software"
}
},
"api-version": "v5.0",
"queries": [
{
"id": 1,
"name": "qname",
"type": "nested-query",
"query": {
"path": "feed",
"fields": "message,story,likes,comments{from}",
"ids": "<pagId1>,<pageId2>"
}
},
{
"id": 3,
"name": "ads_async_insights",
"type": "async-insights-query",
"query": {
"parameters": "fields=ad_id,actions&level=ad&action_breakdowns=action_type&date_preset=last_month&time_increment=1",
"ids": "<page_id>"
}
}
]
},
"authorization": {
"oauth_api": {
"id": "{OAUTH_API_ID}"
}
}
}
For each query extractor generates a number of tables prefixed with query name. Each table represents one type of node so typically tables would be queryname_post
, queryname_likes
, queryname_comments
queryname_insights
. Same nested structure type will be in the same table. So for example comments and subcomments will be in the same table comments
. Every table has different columns but the following columns will always be the same:
ids
property.page_feed_comments
, for subcomments(i.e. comments of comments) it will be page_feed_comments_comments
etckey1
, key2
and value
along with columns metric name, title, description etcRegister component api to oauth-bundle-v2 by calling POST to https://syrup.keboola.com/oauth-v2/manage with manage-token, storage-api token in the header and body:
{
"component_id": "keboola.ex-facebook-insights-v2",
"friendly_name": "Facebook Insights",
"app_key": "xxx",
"app_secret": "xxx",
"oauth_version": "facebook",
"permissions": "manage_pages,public_profile,read_insights,pages_show_list",
"graph_api_version": "v2.8"
}
more info about authorization registration here: https://github.com/keboola/oauth-v2-bundle
The app is written in Clojure(1.8), evaluated and build via Boot-clj which requires Java Development Kit. To try the app locally check target commands in the Makefile For example to build and run the app locally type from the repo root:
make build-jar run-jar
MIT licensed, see LICENSE file.