nestauk / ds_digital_ads

Pilot project exploring insights from digital advertisements
MIT License
0 stars 0 forks source link

add twitter collection and enrichment flows #6

Closed india-kerle closed 6 months ago

india-kerle commented 6 months ago

Description

This PR:

I ultimately chose not to filter by geography because according to twitter documentation, only 1-2% of tweets are geotagged. We are collecting geo info so we could filter downstream in analysis if need be.

Fixes # (issue)

This should close #5 #4

Instructions for Reviewer

You can test the collect tweets flow by not running it in production:

python ds_digital_ads/pipeline/collect_tweets_flow.py run --production False

Similarly with enrich the data by:

python ds_digital_ads/pipeline/clean_tweets_flow.py run --production False

In order to test the code in this PR you need to ...

I haven't written tests for this given its more work for EDA. but to flag downstream.

Please pay special attention to ...

Checklist:

cmbrennan002 commented 6 months ago

Looks good! A couple of minor comments on the code. The first script ran great! The second also ran great, to get it working I had to do a couple of things:

[ ] Change the command line to 'enrich_tweets' (I think you might have just changed the file name at some point - also needs to be adjusted in the instructions in the .py file) [ ] pip install fsspec [ ] pip install s3fs

india-kerle commented 6 months ago

great! thanks @cmbrennan002 - i've updated the requirements and README.md accordingly. Let's update the gambling list in a separate PR with the google ad ids as well - this should be minor.