voxmedia / tap-facebook-pages

Singer tap for organic Facebook content insights built using the Meltano SDK
1 stars 1 forks source link

Credentials for GCS needed? #8

Open educep opened 1 year ago

educep commented 1 year ago

Hello, I'm trying tap-facebook-pages (voxmedia) with docker, when I run config tap-facebook-pages test, meltano is asking for GOOGLE_APPLICATION_CREDENTIALS: Plugin configuration is invalid google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started So I changed to the target-jsonl loader and run: meltano run tap-facebook-pages target-jsonl I'm still getting the same error: 2023-01-13T19:08:46.131099Z [info ] google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started cmd_type=elb consumer=False name=tap-facebook-pages producer=True stdio=stderr string_id=tap-facebook-pages EDIT: I set the variable GOOGLE_APPLICATION_CREDENTIALS in the .env and put the corresponding json. Now I get the error: google.api_core.exceptions.Forbidden: 403 Access Denied: Table g9-data-warehouse-prod:facebook_posts.most_recent: User does not have permission to query table g9-data-warehouse-prod:facebook_posts.most_recent, or perhaps it does not exist in location US. Apparently is querying a table that does not exist. I don't understand why is querying the database when I'm using the loader target-jsonl. It seems this tap is unusable without GCS.

prratek commented 1 year ago

hi @educep - apologies for the delayed response. some streams in this tap had to be customized to suit a specific use case we had and aren't usable by anyone else without modifying the code. in particular, when fetching posts or videos for a page Facebook seems to limit to the top 600 posts/videos per year with vague docs. To ensure we get certain metrics for all posts, we're running queries against tables in our data warehouse to first get a list of post and video IDs and iterate through them.

If you're interested in page level insights, you should be able to select just those streams (like PageEngagementInsightsStream) since none of those required this workaround. let me know if you have any further questions!

acarter24 commented 1 year ago

I ended up with these selection rules to avoid the bigquery issue:

select:
    - '*.*'
    - '!post_insights.*'
    - '!video_insights_lifetime.*'
    - '!video_insights_daily.*'
    - '!all_posts.*'
    - '!all_videos.*'
    - '!recent_post_insights.*'