This repository contains the source code and the dataset for vaccine attitude detection.
The annotations are given in the form of ID,stance,aspect_span_start:aspect_span_end,opinion_span_start:opinion_span_end,aspect_catetegory
in the Datasets_Raw
folder.
To obtain tweet text,
cd twitter_get_text_by_id_twitter4j
./settings/crawler.properties
and setup your consumerKey, consumerSecret, access token and access token secret
.
consumerKey, consumerSecret, access token and access token secret
, please refer to https://developer.twitter.com/en/docs/developer-portal/overview. The Standard v1.1 is sufficient.java -jar twitter_vac_opi_cwl_by_id.jar ./settings/crawler.properties
or javac -cp "./*" ./src/main/org/backingdata/twitter/crawler/rest/TwitterRESTTweetIDlistCrawler.java
The tweets are stored in ./saves
in json format.cd VADMlmFineTuning
VADtransformer is firstly trained unsupervised. The model will be saved to ../datasets/mlm-vad
.
To perform unsupervised training,
UnannotatedTwitterID_training.csv
and UnannotatedTwitterID_testing.csv
with obtained tweet text.../datasets
. The format is the same as vad_train_finetune.txt
.cd src
and run train_vad_albert_vae.py
cd VADStanceAndTextspanPrediction
In the previous step we obtain the unsupervised pre-trained VAD, scilicet the TopicDrivenMaskedLM. At this stage we wrap the model with classifiers and constrains, and train the model.
To perform supervised training,
pytorch_model.bin
file) from the ../datasets/mlm-vad
of VAD unsupervised training to the ./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache
folder. For your convenience a saved TopicDrivenMaskedLM is ready-to-use in the ./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache
folder.config.json
file) from the ../datasets/mlm-vad
of VAD unsupervised training to the ./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2
folder. For your convenience a saved config.json is ready-to-use in the ./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2
folder.cd src
and run vadtrain_eval_predict.py
for training and testing.
vadtrain_eval_predict.py
and run the file. Checkpoints will be saved in ./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/
vadtrain_eval_predict.py
and run the file. The prediction will be output in same directory. A saved model can be downloaded via this link. You can place the save model in ./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/
for a quick start.