greenpeace / gpi-tl-hermes

News Aggregations and Sentiment Analysis app
2 stars 0 forks source link

Firebase as intermediate storage + deduplication #13

Closed krauthex closed 5 years ago

krauthex commented 5 years ago

Firebase RT DB as intermediate storage and preprocessor for BigQuery

Description

It's easier to check for duplicates in the RT database because SQL in BigQuery is a pain in the neck. So the workflow will work like the following: news API request --> nappyTools.Content instance --> GNL API request via Content --> Firebase --> CloudFunction + Trigger metadata --> BigQuery.

Proposal

How to test the implementation

Do the above workflow using the python script mentioned above with real data.

Related Issues

krauthex commented 5 years ago

The cloud function has somehow a problem in unpacking the event dictionary - need to take a closer look at this. If only re-deploying cloud functions wouldn't take forever...

krauthex commented 5 years ago

The cloud function has somehow a problem in unpacking the event dictionary - need to take a closer look at this. If only re-deploying cloud functions wouldn't take forever...

welp, nailed it. ¯_(ツ)_/¯