huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.72k stars 26.22k forks source link

Create my own language model #6561

Closed dhimasyoga16 closed 3 years ago

dhimasyoga16 commented 4 years ago

I'm new in NLP and i want to give BERT a try. I have a wikipedia corpus (in Indonesian language of course, and in .txt format) and want to train it with bert multilingual cased. For further use, i expect that BERT can "adapt" well with indonesian language and can do specific task which is a text similarity task, or if possible do automated scoring based on the 2 texts given.

Can i use the run_language_modeling.py to fine-tune and create my own language model? If it's possible, then what are the exact steps to achieve this?

Thankyou in advance.

julien-c commented 4 years ago

For open-ended questions like this you should try https://discuss.huggingface.co

Did you read https://huggingface.co/blog/how-to-train?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.