Closed trongvanhpkt99 closed 4 years ago
We only have a pretrained model for English at the moment, so it will not work with Vietnamese. Natas calls OpenNMT-py on the background, so basically you can use onmt_translate
with your own model, pass it -n_best 10 and filter the results with a dictionary.
We only have a pretrained model for English at the moment, so it will not work with Vietnamese. Natas calls OpenNMT-py on the background, so basically you can use
onmt_translate
with your own model, pass it -n_best 10 and filter the results with a dictionary.
Thank you! Can you give me the English pretrained model and tell me how to use it?
This is how to use it from Natas:
import natas
natas.ocr_correct_words(["paft", "friendlhip"])
To use it with OpenNMT, you must first download the model.
Then you will need to prepare a text file with the words you want to OCR post-correct so that there is one word per line and each word should be split into characters.
So if you have a sentence cat ran avvay you should produce the following text file _ocrerrors.txt
c a t
r a n
a v v a y
Then you can run onmt_translate -model ocr.pt -src ocr_errors.txt -output ocr_fixed.txt -replace_unk -verbose
. This will produce a text file _ocrfixed.txt with the OCR corrections. OpenNMT lets you do all sorts of things in translate, so please refer to their documentation as well.
This is how to use it from Natas:
import natas natas.ocr_correct_words(["paft", "friendlhip"])
To use it with OpenNMT, you must first download the model.
Then you will need to prepare a text file with the words you want to OCR post-correct so that there is one word per line and each word should be split into characters.
So if you have a sentence cat ran avvay you should produce the following text file _ocrerrors.txt
c a t r a n a v v a y
Then you can run
onmt_translate -model ocr.pt -src ocr_errors.txt -output ocr_fixed.txt -replace_unk -verbose
. This will produce a text file _ocrfixed.txt with the OCR corrections. OpenNMT lets you do all sorts of things in translate, so please refer to their documentation as well.
Thank you! I'll try it
I want to train a model for OCR-correcting output in Vietnamese, so at fist I want to know how to use a pre-trained model