Closed jonil3400 closed 5 months ago
The language data used by Tesseract.js by default is stored in this repo. Default language data is not something we actively manage/edit, but rather we inherit the default language data from the main Tesseract project.
Looking in this repo, it looks like no tgl
(Tagalog
) data exists for the LSTM model (the default). Therefore, your options for recognizing it are the following.
oem
value 0
), which does support this language.
await createWorker(["eng", "tgl"], 0)
.traineddata
file, or train one yourself, and then use that.
langPath
argument
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo) "tesseract.js": "^5.1.0", Describe the bug Running createWorker with tgl language results in error.
Uncaught Error: Error: Network error while fetching https://cdn.jsdelivr.net/npm/@tesseract.js-data/TGL/4.0.0_best_int/TGL.traineddata.gz. Response code: 404 at createWorker.js:247:1 at worker.onmessage (onMessage.js:3:1)
To Reproduce await createWorker(["eng", "TGL"]);
Expected behavior TGL language can be used
Device Version: Windows 11 Chrome , Node 18.15