tesseract-ocr langdata_lstm issues

tesseract-ocr / langdata_lstm

Data used for LSTM model training

Apache License 2.0

114 stars 151 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

NO fas.unicharset and fas.xheights file for Persian Language

#60 AinazRafiei opened 1 month ago
6
Rename frk -> deu_latf (ISO 639-3, ISO 15924)

#59 stweil closed 4 months ago
15
grc letters with dot below

#57 nisbet-hubbard opened 6 months ago
0
θ in Greek book font rendered as swash form

#56 nisbet-hubbard opened 6 months ago
2
Missing GREEK LUNATE SIGMA SYMBOL in grc and script/Greek models

#55 nisbet-hubbard opened 6 months ago
4
Slight modification in Bodhi for incorporating a few unique characters in Drenjongke

#54 bloodgroup-cplusplus opened 8 months ago
0
Adding Additional Fonts for bodhi and dzongkha

#53 bloodgroup-cplusplus opened 8 months ago
0
Adding additional language Denjongke (sikkimese bhutia) to tesseract language dataset

#52 bloodgroup-cplusplus closed 8 months ago
3
Armenian letter և missing in hye language - confirmation

#51 reneclais closed 9 months ago
1
Armenian.traineddata contains the missing character, so I suggest to try that model.

#50 reneclais closed 9 months ago
2
Missed letter in the hye.traineddata

#49 reneclais opened 9 months ago
3
English traineddata file does not contain the '±' character?

#48 Furtifk opened 1 year ago
7
Bontot janda

#47 Awiemanja closed 2 years ago
0
Add Shan language data

#46 ronaldaug opened 2 years ago
2
Training data should include bullet-like characters

#45 wollmers opened 2 years ago
0
Added unicharset file to Akkadian language

#44 wincentbalin closed 2 years ago
1
Update deu.unicharset

#43 OttoKerner closed 1 week ago
3
Missing some Thai numbers in Thai language (tha)

#42 crossknight opened 3 years ago
0
Inherited.unicharset built by copying lines from existing unicharsets

#41 Shreeshrii opened 3 years ago
1
how to train this files to get .traineddata

#40 josef821 closed 1 year ago
3
Update asm.wordlist

#39 hjkgithub opened 4 years ago
3
Alternative way to download langdata_lstm master file instead from github

#38 timjin520 closed 8 months ago
11
Missing support for Coptic script

#36 stweil opened 4 years ago
1
Update desired_characters for fin model

#35 jmokoistinen opened 4 years ago
0
Update dan/desired_characters based on the Swedish one

#34 poizan42 closed 4 years ago
1
Add the "@" character please to the list of desired characters

#30 Furtifk closed 4 years ago
2
Add support for Shan language (shn)

#33 ronaldaug closed 2 years ago
8
Danish traineddata file doesn't include the "@" character

#29 Furtifk opened 4 years ago
9
Tesseract fails to detect letters Å and å in Finnish language.

#31 jmokoistinen opened 4 years ago
4
Trailing spaces on line 27 of eng.punc

#28 juliangilbey opened 4 years ago
4
Please use more fonts for training Uyghur

#27 gheyret opened 4 years ago
0
Normalize unicode in texts

#26 stweil closed 4 years ago
0
Duplicate fonts names in okfonts

#25 amitdo closed 4 years ago
2
Support for New Reiwa Era Character ㋿ in Japanese

#32 prateek4sep opened 4 years ago
1
Please add description for repo - Suggested Text:

#24 Shreeshrii opened 4 years ago
0
Partially revert commit 02cc8f028532367dd44ba5fb3cbb6ac0bf73d6ad

#23 stweil closed 5 years ago
2
error related to script data during training

#22 Shreeshrii closed 5 years ago
9
Add Apache license file

#21 stweil closed 5 years ago
1
Fix langdata config for Chinese, Japanese and German

#20 stweil closed 5 years ago
1
Move script data to new script subdirectory

#19 stweil closed 5 years ago
2
rename kur to kur_ara

#18 Shreeshrii closed 2 years ago
4
Apparently Lao\Lao.unicharset Has Uncommitted Changes

#17 ColdWinterWind closed 5 years ago
1
tessedit_ocr_engine_mode 1 for san (Sanskrit language, Devanagari script)

#16 Shreeshrii closed 5 years ago
1
tessedit_ocr_engine_mode 1 for nep (Nepali language, Devanagari script)

#15 Shreeshrii closed 5 years ago
0
tessedit_ocr_engine_mode 1 for mar (Marathi language, Devanagari script)

#14 Shreeshrii closed 5 years ago
0
tessedit_ocr_engine_mode 1 for hin (Hindi language, Devanagari script)

#13 Shreeshrii closed 5 years ago
0
fix unicharset errors

#12 Timilehin closed 5 years ago
0
update yoruba unicharset

#11 Timilehin closed 5 years ago
0
improve yoruba training data quality

#10 Timilehin closed 5 years ago
0
Should we update swe.training_text if new characters are added to desired_characters ?

#9 aslamy opened 5 years ago
1