Closed adiamante closed 1 year ago
Hello,
Package should include all functionality to make whitelist to be used in your recognizion. It is not found in injectable ITesseract interface, but you can use it from TesseractOcrMaui.TessEngine class
You can find constructor with signature
public TessEngine(string languages, string traineddataPath, EngineMode mode, IDictionary<string, object> initialOptions, ILogger? logger = null)
You can pass optional configuration parameters as IDict "initialOptions"
Add value to your options with key "tessedit_char_whitelist" and string of whitelisted characters as value.
See stackoverflow
You can use tesseract in a normal way. Example in here, about row 200
I am not able to validate this to work immidiately. I try to test this as soon as I can.
Regards, Henri Vainio
Hi @henrivain,
No rush. For my use case, I'm looking for someting like
tesseract.setVariable("tessedit_char_whitelist","ABCDEFGHIJKLMNOPQRSTUVWXYZ");
since I need to update the whitelist during runtime.
Hi @adiamante
You can find this method
TesseractOcrMaui.TessEngine.SetVariable(string name, string value)
I think this is what your are looking for.
using var engine = new TessEngine("eng");
bool success = engine.SetVariable("tessedit_char_whitelist", "mychars");
using var image = Pix.LoadFromFile(@"c:\to\file.png");
using var result = engine.ProcessImage(image);
string text = result.GetText();
Hey @henrivain,
With that sample code, is there a way for me to initialize the engine with a MAUI raw asset traineedata? I'm getting the following upon initializing TessEngine:
{TesseractOcrMaui.Exceptions.TesseractInitException: Cannot initialize Tesseract Api
---> System.InvalidOperationException: No traineddata files found from path. Do you have correct path and file names?
--- End of inner exception stack trace ---
at TesseractOcrMaui.TessEngine.Initialize(String languages, String traineddataPath, EngineMode mode, IDictionary`2 initialOptions)
at TesseractOcrMaui.TessEngine..ctor(String languages, String traineddataPath, EngineMode mode, IDictionary`2 initialOptions, ILogger logger)
at TesseractOcrMaui.TessEngine..ctor(String languages, String traineddataPath, ILogger logger)
at YeetMacro2.Platforms.Android.Services.AndroidWindowManagerService..ctor(ILogger`1 logger, MediaProjectionService mediaProjectionService, IToastService toastService) in C:\Users\Desktop\Desktop\kappagacha\yeetmacro2\YeetMacro2\Platforms\Android\Services\AndroidWindowManagerService.cs:line 81}
I've also attempted to give it the path Raw and Resources/Raw to no avail. For Context, I am attempting this on Android.
Got it to work with the following after looking at some code from the repository.
public static Stream GetAssetStream(string path)
{
return FileSystem.OpenAppPackageFileAsync(path).Result;
}
var tranineddataPath = Path.Combine(FileSystem.Current.CacheDirectory, "eng.traineddata");
if (!File.Exists(tranineddataPath)) {
var traineddata = ServiceHelper.GetAssetStream("eng.traineddata");
FileStream fileStream = File.Create(tranineddataPath);
traineddata.CopyTo(fileStream);
}
_tessEngine = new TessEngine("eng", FileSystem.Current.CacheDirectory);
I also was now able to test it myself and I also got it to work.
The reason you had the exceptions was that traineddata was not loaded. Automatically loaded traineddata -functionality is only available in async methods in most high level ITesseract
api. If TessEngine api is used directly, tessdata must also be downloaded manually. But you figured it out, so it is all okay.
I try to add easier way to configure engine in runtime from ITesseract -interface, see issue #16
I close this with this comment. If you have any questions or other problems with the package, don't hesitate to contact me. I hope this package can help you in your projects!
-HV
Hey @henrivain
The following works on an android emulator but fails on a physical device:
var imageData = {{my byte array}};
var page = Pix.LoadFromMemory(imageData);
I'm getting the followin exception:
System.IO.IOException: 'Failed to load image from memory.'
Any ideas on how I can troubleshoot this?
I can reproduce. I think the problems are coming from native libraries, so I have to explore them better. I move this new problem to its own issue, because is it no longer related to whitelisting. If you have anything about this new issue, add them to the new related issue #17 Can you specify the image type/extension you are using?
Thanks @henrivain
Hi,
Thank you for making this. Would you be able to add text whitelist capability?