Source code of our KDD 2022 paper: Predicting Opinion Dynamics via Sociologically-Informed Neural Networks [[Link to the paper]()]
src/: The models are stored here.
voter.py: PyTorch implementation of Voter model.
degroot.py: PyTorch implementation of DeGroot model.
aslm.py: PyTorch implementation of asynchronous linear model (AsLM).
slant.py: PyTorch implementation of a nonlinear generative model for opinion dynamics (SLANT).
slant_plus.py: PyTorch implementation of SLANT+.
nn.py: PyTorch implementation of neural network (NN).
sinn.py: PyTorch implementation of the proposed method, Sociologically-Informed Neural Networks (SINN).
working/: The preprocessed datasets are stored here.
Each row has the format: {user_id},{opinion},{time}
.
{opinion}
is manually annotated class label.
Due to the privacy concern, we removed tweet text.
posts_final_synthetic_consensus.tsv: Synthetic dataset generated using SBCM (stochastic opinion dynamics model) with exponent parameter $\rho=-1.0$.
posts_final_synthetic_clustering.tsv: Synthetic dataset generated using SBCM with exponent parameter $\rho=0.1$.
posts_final_synthetic_polarization.tsv: Synthetic dataset generated using SBCM with exponent parameter $\rho=1.0$.
Each row has the format: {user_id},{opinion},{time}
.
We provide synthetic datasets in working/. But you can also generate these datasets.
Generate three synthetic datasets
python3 simulate.py
We provide sample real data in working/. But you can also query the dataset via API.
Set your Bearer Token in your environment variable by running the following command in your terminal.
export BEARER_TOKEN='<your-bearer-token>'
Get tweets by query search by running
python3 collect_twitter.py
Get twitter probiles by username
python3 collect_twitter_user.py
Get Reddit posts
python3 collect_reddit.py
Real-world datasets preprocessing
python3 preprocess.py
Change data formats
python3 convert_data.py
Training: Run main_sinn.py
file to train and evaluate the proposed method with default settings.
python3 main_sinn.py
To train all models, run the bash shell script
./run_all_methods.sh
To tune hyperparameters in NN and our SINN, run the bash shell script
./run_grid_search.sh
To specify the parameters, run
python3 main_sinn.py \
--method SINN \
--dataset synthetic_consensus \
--save_dir output/ \
--num_hidden_layers 5 \
--hidden_features 8 \
--alpha 0.1 \
--beta 0.1 \
--num_epochs 500 \
--lr 0.001 \
--K 1 \
--type_odm SBCM
method
: str (default=SINN)
Options are "Voter", "DeGroot", "AsLM", "SLANT", "SLANT+", "NN", "SINN"
dataset
: str (default=synthetic_consensus)
Options are "synthetic_consensus", "synthetic_clustering", "synthetic_polarization", "sample_twitter_Abortion"
save_dir
specifies the path to save the trained model. The model path defaults to be "./output" if not specified.
num_hidden_layers
specifies the number of layers $L$ in the neural network.
hidden_features
specifies the number of units per layer $N_u$ in the neural network.
alpha
specifies the trade-off hyperparameter $\alpha$.
beta
specifies the trade-off hyperparameter $\beta$.
lr
specifies learning rate.
K
specifies dimension of latent space.
type_odm
specifies the choice of opinion dynamics model.
use_profile
: bool (default=False)
If True, then use profile descriptions of Twitter users as input of the neural network.
To visualize evaluation results, run plotting.py
file.
python3 plotting.py
If you use this code or data, please cite the following paper:
@inproceedings{okawa2022sinn,
author = {Okawa, Maya and Iwata, Tomoharu},
title = {Predicting Opinion Dynamics via Sociologically-Informed Neural Networks},
booktitle = {{KDD} '22: The 28th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining},
pages = {},
year = {2022}
}