If there is any last minute Twitter data collection we can do before access shuts off, we will do it.
This repo expands upon the previous approach, with improvements such as storing the tweet language and matching rule identifiers.
Make a copy of this repo. Clone / download your copy of the repo onto your local computer (e.g. the Desktop) then navigate there from the command-line:
cd ~/Desktop/tweet-analysis-2023
Setup a virtual environment:
conda create -n tweets-2023 python=3.10
Activate virtual environment:
conda activate tweets-2023
Install packages:
pip install -r requirements.txt
Create your own Google APIs project, obtain JSON credentials file for a service account, and download it to the root directory of this repo, naming it specifically "google-credentials.json".
Obtain Twitter API credentials from the Twitter Developer Portal (i.e. TWITTER_BEARER_TOKEN
below). Ideally research level access.
Obtain Sendgrid API credentials from the Sendgrid website (i.e. SENDGRID_API_KEY
below). Create a sender identity and verify it (i.e. SENDER_ADDRESS
below).
If you would like to save files to cloud storage, create a new bucket or gain access to an existing bucket, and set the BUCKET_NAME
environment variable accordingly (see environment variable setup below).
You can use a SQLite database, or a BigQuery database.
If you want to use a Bigquery database, in the respective Google APIs project, setup a new BigQuery dataset for each new collection effort. Consider creating two datasets, one for development and one for production.
The DATASET_ADDRESS
environment variable will be a namespaced combination of the google project name and the dataset name (i.e. "my-project.my_dataset_development").
Create a new local ".env" file and set environment variables to configure services, as desired:
# this is the ".env" file..
#
# GOOGLE APIS
#
# path to the google credentials file you downloaded
GOOGLE_APPLICATION_CREDENTIALS="/Users/path/to/tweet-analysis-2022/google-credentials.json"
#
# GOOGLE BIGQUERY
#
DATASET_ADDRESS="my-project.my_database_name"
#
# GOOGLE CLOUD STORAGE
#
BUCKET_NAME="my-bucket"
#
# TWITTER API
#
TWITTER_BEARER_TOKEN="..."
#
# SENDGRID API
#
SENDGRID_API_KEY="SG.___________"
SENDER_ADDRESS="example@gmail.com"
Demonstrate ability to fetch data from BigQuery, as desired:
python -m app.bq_service
Demonstrate ability to fetch data from Twitter API, as desired:
python -m app.twitter_service
Demonstrate ability to send email, as desired:
python -m app.email_service
Demonstrate ability to connect to cloud storage, as desired:
python -m app.cloud_storage
Search:
Streaming: