Add Arabic-Indic numerals to Arabic

tesseract-ocr / langdata

Source training data for Tesseract for lots of languages

Apache License 2.0

838 stars 889 forks source link

Add Arabic-Indic numerals to Arabic #71

Closed Shreeshrii closed 7 years ago

Shreeshrii commented 7 years ago

Please see https://github.com/tesseract-ocr/tesseract/issues/858

include both 0-9 and ( ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩) for Arabic.

Shreeshrii commented 7 years ago

0x30 to 0x39
U+0660 through U+0669

Testya commented 7 years ago

Please add the Arabic comma too, (،) U+060C.

aboelmor commented 7 years ago

Any idea of when the eastern-arabic numerals will be added to the language packs?

theraysmith commented 7 years ago

Added to my local copy for next round of training. Then I will push updated langdata as well.

Shreeshrii commented 7 years ago

@theraysmith

I hope you have seen other comments regarding using only persian number range for persian and arabic range for Arabic.

theraysmith commented 7 years ago

Yes, I hope the experts also see my question about the Arabic languages not mentioned by those issues (kur_ara, pus, uig).

On Mon, Aug 7, 2017 at 6:38 PM, Shreeshrii notifications@github.com wrote:

@theraysmith https://github.com/theraysmith

I hope you have seen other comments regarding using only persian number range for persian and arabic range for Arabic.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/langdata/issues/71#issuecomment-320825657, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056VBYghCZM2R0sOYmSVFtcDC26cPeks5sV7whgaJpZM4NN0lN .

-- Ray.

AbdelsalamHaa commented 6 years ago

Hi guys , I'm using tesseract 4 I'm using ara.traineddata to extract the text from the image. it's working well for the letters but numbers is not good at all . From the comment above there should be some other traineddata for only numbers . any body can guide me where to find it .

thank a lot

amitdo commented 4 years ago

It seems that Ray didn't push the data to our side (langdata_lstm and best/fast repos).

amitdo commented 3 years ago

This issue should be re-opened.