Closed Melanee-Melanee closed 1 month ago
Non-Google (user-contributed) training data has its own repository: https://github.com/tesseract-ocr/tessdata_contrib.
Please take a look there for inspiration on how your PR should be structured.
@Melanee-Melanee, tessdata_contrib already contains a model for cuneiform (akk.traineddata). But you write in your paper that Tesseract is not the right choice for handwritten scripts, and I agree. You did not mention opr.traineddata in your paper. Where do you describe this model, and where exactly can it be found? I also suggest to fix the typo "tessearct" (for exampe tessearct_old_persian) in your paper and repository (even in filenames).
Thanks a lot @stweil
Akkadian cuneiform is different with Old Persian cuneiform. The types of cuneiform inscriptions such as Sumerian, Akkadian, Babylonian, Assyrian, Elamite, Hittite, Urartian, and Old Persian, each of them is a unique language.
The name of my new tesseract model on my GitHub is myLang.traineddata
instead of op.traineddata
, shall I rename it?
You can find my model here:
Besides, I did correct my dictation error for "tessearct", I am grateful for informing me.
Thank you @zdenop
So you mean I must pull my new trained data on: https://github.com/tesseract-ocr/tessdata_contrib ?
Ok, I will.
So you mean I must pull my new trained data on: https://github.com/tesseract-ocr/tessdata_contrib ?
Yes, please. And provide also additional information about module (Is it best, fast or legacy model ?) - see how others did it.
@zdenop @stweil I sent my new pull request on https://github.com/tesseract-ocr/tessdata_contrib.
Please check it, Thanks a lot for your collaboration.
Dear manager
I am an AI developer and currently trained a new Tesseract language model for Old Persian language. My new model (
op.traineddata
) works properly for Old Persian language and I have published it on my GitHub repository:https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR
Additionally, that would be my honor to pull my new trained language model on your repository to be available by other developers. To test my model, you can use these custom Old Persian images:
https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR/tree/master/other/custom%20images
Moreover, I have published my new paper regarding to my new model:
https://www.researchgate.net/publication/382528886_Translating_Old_Persian_cuneiform_by_artificial_intelligence_AI
I hope my new uploaded model (
op.traineddata
) will be merged on your repository.Sincerely
Melanee