Official code release of our ACL 2021 work, Syntax-augmented Multilingual BERT for Cross-lingual Transfer.
[Notes]
We setup a conda environment in order to run experiments. We assume anaconda and Python 3.6 is installed. The additional requirements (as noted in requirements.txt can be installed by running the following script:
bash install_tools.sh
The next step is to download the data. To this end, first create a download
folder with mkdir -p download
in the root
of this project. You then need to manually download panx_dataset
(for NER) from here
(note that it will download as AmazonPhotos.zip
) to the download directory. Finally, run the following command to
download the remaining datasets:
bash scripts/download_data.sh
To get the POS-tags and dependency parse of input sentences, we use UDPipe. Go to the
udpipe directory and run the task-specific scripts -
[xnli.sh|pawsx.sh|panx.sh|mtop.sh]
.
The evaluation results (on the test set) are saved in ${SAVE_DIR}
directory (check the bash scripts).
cd scripts
bash xlt_classify.sh GPU TASK USE_SYNTAX SEED
For cross-lingual text classification, do the following.
# for XNLI
bash xlt_classify.sh 0 xnli false 1111
# for PAWS-X
bash xlt_classify.sh 0 pawsx false 1111
USE_SYNTAX=true
.cd scripts
bash panx.sh GPU USE_SYNTAX SEED
USE_SYNTAX=true
.cd scripts
bash mtop.sh GPU USE_SYNTAX SEED
USE_SYNTAX=true
.We acknowledge the efforts of the authors of the following repositories.
@inproceedings{ahmad-etal-2021-syntax,
title = "Syntax-augmented Multilingual {BERT} for Cross-lingual Transfer",
author = "Ahmad, Wasi and
Li, Haoran and
Chang, Kai-Wei and
Mehdad, Yashar",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.350",
pages = "4538--4554",
}