Closed india-kerle closed 6 months ago
Looks good! A couple of minor comments on the code. The first script ran great! The second also ran great, to get it working I had to do a couple of things:
[ ] Change the command line to 'enrich_tweets' (I think you might have just changed the file name at some point - also needs to be adjusted in the instructions in the .py file) [ ] pip install fsspec [ ] pip install s3fs
great! thanks @cmbrennan002 - i've updated the requirements and README.md accordingly. Let's update the gambling list in a separate PR with the google ad ids as well - this should be minor.
Description
This PR:
I ultimately chose not to filter by geography because according to twitter documentation, only 1-2% of tweets are geotagged. We are collecting geo info so we could filter downstream in analysis if need be.
Fixes # (issue)
This should close #5 #4
Instructions for Reviewer
You can test the collect tweets flow by not running it in production:
Similarly with enrich the data by:
In order to test the code in this PR you need to ...
I haven't written tests for this given its more work for EDA. but to flag downstream.
Please pay special attention to ...
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s