nestauk / ds_digital_ads

Pilot project exploring insights from digital advertisements
MIT License
0 stars 0 forks source link

Initial Twitter EDA #13

Open india-kerle opened 6 months ago

india-kerle commented 6 months ago

Description

This PR largely adds a twitter_eda.ipynb notebook to do some initial exploration of twitter data.

I played around with tesseract library (google's OCR engine) to extract text from the tweet images but it didn't do such a good job and I thought it was a rabbit hole not worthing going down (yet!) - it's not really in this PR.

Feel free to take a look at the notebook! I've largely added the graphs to the deck you put together.

Fixes # (issue)

9

Instructions for Reviewer

In order to test the code in this PR you need to ...

Please pay special attention to ...

Checklist:

india-kerle commented 6 months ago

heads up @cmbrennan002 it looks like you pushed your google ads collection PR to this branch - i've deleted your file and changes to the data_collection_utils.py file so that we can keep the PRs separate for review + not deal with merge conflicts downstream.

This is still in draft as I'm currently working on the branch but I will convert it to a full PR once ready.