mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.79k stars 433 forks source link

Can I add some vocabs of Vietnamese to VOCABS file ? #879

Closed calibretaliation closed 2 years ago

calibretaliation commented 2 years ago

🚀 The feature

I have added vietnamese vocabs for VNese devs like me to use doctr better, following this issue: https://github.com/mindee/doctr/pull/464#pullrequestreview-752843039 This is my pull request, please review it.

Motivation, pitch

I am very appreciate and congrats on your great work. This package will help me alot in my work :) However, I am working on Vietnamese OCR, so Vietnamese vocabs is required and I can't see any guide to locally added vietnamese vocabs without changing the main repository (which I am quite afraid of damaging something I dont know). It would be very nice if you could guide me or review my contribution on vietnamese vocabs here: https://github.com/mindee/doctr/pull/878

Alternatives

No response

Additional context

And please, if you agree with my contribution, can you also give me soem tips on how to apply the new vocab in your work ? Are there any tools or guide line on how to create and train a custom dataset ? Thank you very much ~!

Xargonus commented 2 years ago

Hey, I am in a similar situation, but in my case, I need a model for Czech. I found the advice they gave to someone who wanted to train the text recognition model in Spanish useful https://github.com/mindee/doctr/discussions/565.

felixdittrich92 commented 2 years ago

@charlesmindee Can be closed ? :)

kiko0217 commented 2 years ago

just simple, transfer learning maybe can be solution. edit little in end layer and then learning the model with your custom data.

charlesmindee commented 2 years ago

Since the entries have been added to the VOCABS, this can be closed