11 Apr 2019
Zhisong Zhang, Naoki Otani, Aldrian Obaja Muis
Project done for 11-821 Linguistics Seminar course in CMU, Spring 2019.
In this project we aim to analyze from linguistics perspective the segmentation behavior of BPE.
To calculate the type counts for each language, run bash calc_type_count.bash
To run BPE experiments on various vocabulary sizes, run bash run_bpe.bash
.
This assumes you have installed sentencepiece Python wrapper pip install sentencepiece
.