voxmedia / tap-facebook-pages

Singer tap for organic Facebook content insights built using the Meltano SDK
1 stars 1 forks source link

Adding `context` to row `id` when values contains a dict #10

Open acarter24 opened 1 year ago

acarter24 commented 1 year ago

When a response contains a subdict, i.e. insights/page_consumptions_by_consumption_type, the context (key from subdict) is extracted to a separate column, but not appended to the top level id. I am using target-mssql which complains about duplicate primary keys in any upserted chunk.

Example:

{"context": "link clicks", "value": 252.0, "end_time": "2023-03-31 07:00:00", "name": "page_consumptions_by_consumption_type", "period": "days_28", "title": "28 Days Page Consumptions By Type", "id": "***/insights/page_consumptions_by_consumption_type/days_28", "page_id": "***"}
{"context": "other clicks", "value": 26.0, "end_time": "2023-03-31 07:00:00", "name": "page_consumptions_by_consumption_type", "period": "days_28", "title": "28 Days Page Consumptions By Type", "id": "***/insights/page_consumptions_by_consumption_type/days_28", "page_id": "***"}

I tried setting context as an extra primary key but for 95% of rows this is null, so causes another error.

Instead this modification seems to work:

https://github.com/voxmedia/tap-facebook-pages/blob/08800f1c8c0f7cac42a2951d43c78bec2dcc7fd8/tap_facebook_pages/streams.py#L224

def parse_response(self, response: requests.Response) -> Iterable[dict]:
...snip...
                            else:
                                item = {
                                    "context": key,
                                    "value": float(value),
                                    "end_time": end_time,
                                }
                                item.update(base_item)
                                item['id'] = "/".join([item['id'], key.replace(" ", "_")])  # add the context as part of the id
                                yield item

The JSON example above would become:

{"context": "link clicks", "value": 252.0, "end_time": "2023-03-31 07:00:00", "name": "page_consumptions_by_consumption_type", "period": "days_28", "title": "28 Days Page Consumptions By Type", "id": "***/insights/page_consumptions_by_consumption_type/days_28/link_clicks", "page_id": "***"}
{"context": "other clicks", "value": 26.0, "end_time": "2023-03-31 07:00:00", "name": "page_consumptions_by_consumption_type", "period": "days_28", "title": "28 Days Page Consumptions By Type", "id": "***/insights/page_consumptions_by_consumption_type/days_28/other_clicks", "page_id": "***"}