google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.28k stars 1.18k forks source link

Access to trained models' parameters experimented on NMT experiments. #399

Closed JJumSSu closed 4 years ago

JJumSSu commented 5 years ago

Hi, First of all, thank you for your great work and nice library. I was inspired by your work which tries to inform the NMT model "the word composition".

I'm currently doing my research on the effects of subword regularization method(unigram language model) on NMT models.

https://github.com/google/sentencepiece/blob/master/doc/experiments.md

In this link, there were experimental results for various segmentation methodologies.

I was hoping that I could have access to the trained models' parameters(e.g. .ckpt files)

If possible, can I have access to the trained models' files or quick explanation for how to train NMT model with "sentencepiece" segmentation method?

Thank you

taku910 commented 4 years ago

Thank you for using sentencepiece. I'm afraid that we cannot provide these model files as all settings seems gone.

In order to compare different segmentation algorithms, it would be better to train sentencepiece model by yourself using the same data, especially when comparing it with other unsupervised models.