Open GoogleCodeExporter opened 9 years ago
I made an overview list of the codes in subdirectories and in the tesseract doc
file:
Those entries, where the key equals the value (e.g. afr) are available, but are
not listed in the documentation.
Let me know if you want me to (try to) supply a patch for this.
Array
(
[afr] => afr
[Albanian] => sqi
[Arabic] => ara
[Azerbauijani] => aze
[bel] => bel
[ben] => ben
[Bulgarian] => bul
[Catalan] => cat
[Cherokee] => chr
[Croation] => hrv
[Czech] => ces
[Danish] => dan
[Danish (Fraktur)] => dan-frak
[deu-frak] => deu-frak
[Dutch] => nld
[English] => eng
[equ] => equ
[Esperanto] => epo
[Estonian] => est
[eus] => eus
[Finnish] => fin
[French] => fra
[frk] => frk
[Galician] => glg
[German] => deu
[grc] => grc
[Greek] => ell
[Hebrew] => heb
[Hindi] => hin
[Hungarian] => hun
[Indonesian] => ind
[isl] => isl
[Italian] => ita
[ita_old] => ita_old
[Japanese] => jpn
[kan] => kan
[Korean] => kor
[Latvian] => lav
[Lithuanian] => lit
[mal] => mal
[mkd] => mkd
[mlt] => mlt
[msa] => msa
[Norwegian] => nor
[Old English] => enm
[Old French] => frm
[osd] => osd
[Polish] => pol
[Portuguese] => por
[Romanian] => ron
[Russian] => rus
[Serbian] => srp
[Simplified Chinese] => chi_sim
[slk-frak] => slk-frak
[Slovakian] => slk
[Slovenian] => slv
[Spanish] => spa
[spa_old] => spa_old
[swa] => swa
[Swedish] => swe
[Tagalog] => tgl
[Tamil] => tam
[Telugu] => tel
[Thai] => tha
[Traditional Chinese] => chi_tra
[Turkish] => tur
[Ukrainian] => ukr
[Vietnamese] => vie
)
Original comment by syr...@gmail.com
on 7 Aug 2014 at 9:24
I also want to patch tesseract, so that the command line option
--list-languages-with-description gives a list with code and language name. (I
mentioned this already)
Original comment by syr...@gmail.com
on 7 Aug 2014 at 9:27
1. Do not mix 2 different topics in one issue.
2. Updating doc for 3.03 with releasing 3.03 language files is strange.
3. I am against "--list-languages-with-description" First of all are several
intention (e.g. removing language files from tesseract engine repository,
separate community training files, other distribution of language file... ) so
the "--list-languages-with-description" will never provide accurate out.
Next: tesseract is following ISO 639-3 standard for language filename. If
somebody wants the know what does it mean (s)he should use the relevant doc[1].
And there is a legal issue - Can you implement ISO 639-3 standard information
under Apache 2 licence?
[1] http://www-01.sil.org/iso639-3/
Original comment by zde...@gmail.com
on 8 Aug 2014 at 11:51
@Z: I got your points, and view. My list way mainly to tell you, what's
different in the manual, and in the checked-out version (see my list above).
What's about adding a link to http://www-01.sil.org/iso639-3/codes.asp in both
the manual, and the --list-languages output ?
Original comment by syr...@gmail.com
on 8 Aug 2014 at 7:20
That should not be problem. But I would suggest to keep open this issue until
announced changes[1] will take place.
[1] https://groups.google.com/forum/#!msg/tesseract-dev/kJEYuvEZuDs/uYBBwwOJE_IJ
Original comment by zde...@gmail.com
on 9 Aug 2014 at 6:08
Original issue reported on code.google.com by
syr...@gmail.com
on 6 Aug 2014 at 7:31