tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Using tensor2tensor to build Language model of another language #1165

Closed okoub closed 5 years ago

okoub commented 6 years ago

Hi, I want to use tensor2tensor to build a Language model for another language based on over 1M sentences I have. I couldn't find any example for what is the best way to do so - is it possible? How?

afrozenator commented 5 years ago

Hi @okoub - Can you see the following example of LanguagemodelLm1b32k in tensor2tensor/data_generators/lm1b.py and see if that works for you?

Also we'd love to have your lamguage modelling task as a PR if you wish?

I'll close this for now and let me know if you run into issues.

okoub commented 5 years ago

@afrozenator I tried but failed to understand what should be the format of the input and how I should pul it there. My corpus is a .txt file (that can obviously be easily converted to a list of string). What should I do to make it work with this: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/lm1b.py ? Thank you very much