nut-tree / nut.js

Native UI testing / controlling with node
https://nutjs.dev
2.22k stars 122 forks source link

Help with OCR Integration #290

Closed Reiss-Cashmore closed 3 years ago

Reiss-Cashmore commented 3 years ago

Short overview I was wondering if I might be able to help with the OCR feature for this lib Ref: https://github.com/nut-tree/nut.js/issues/67

I think it would be a great addition and useful for me personally. I don't mind helping with the integration if I can understand your approach and architecture? Tesseract looks like a good option

I can also help with getting the colour at a pixel location feature. It looks like it just needs integrating from Robot.js

Reiss-Cashmore commented 3 years ago

Alternatively it could be nice to integrate something more general like Tensorflow.js which could allow for an implementation where users are free to run any TF model within the context of Nut.Js. Might even be worth having an abstraction/interface so that libraries like Tesseract, TF and others can be swapped between easily and have consistent outputs

A general implementation could mean lots of support for things like: Text recognition / OCR Face detection Barcode scanning Image labeling Object detection & tracking Language identification / Translation Smart Reply / Conversation aware responses

As there are a lot of great free and open-source optimised models for such tasks built with Tensorflow and Tensorflow Lite.

Some good links https://www.tensorflow.org/js https://www.npmjs.com/package/@tensorflow/tfjs-node https://www.npmjs.com/package/@tensorflow/tfjs-node-gpu https://blog.tensorflow.org/2021/09/blog.tensorflow.org202109optical-character-recognition.html https://blog.tensorflow.org/2020/01/run-tensorflow-savedmodel-in-nodejs-directly-without-conversion.html

Some pre-canned Tensorflow Lite Models that have been converted and optimised from bigger more established libraries: https://github.com/tulasiram58827/ocr_tflite https://github.com/tensorflow/tfjs-models/tree/master/tasks https://tfhub.dev/sayakpaul/lite-model/east-text-detector/fp16/1

Alternatively a couple of other possible libraries for considerations: https://github.com/dbashford/textract https://github.com/antimatter15/ocrad.js

s1hofmann commented 3 years ago

I closed #67 as I have a working OCR module in place. It’s not yet released and I’m currently making some decisions on how I want to proceed.

s1hofmann commented 3 years ago

At the moment I’m rethinking a few architectural things as well, therefore I added #259 to the 2.0 milestone, as some things might change most likely.

Reiss-Cashmore commented 3 years ago

Ahh okay awesome that you have an implementation @s1hofmann !

No worries, if you need any help on anything just drop me a line. I'm pretty comfortable with the Node/JS side of things.

Something worth mentioning architecture wise perhaps is that I found myself putting together the setup of TemplateMatchingFinder, VisionAdapter and MatchRequest myself rather than using screen.xxxxx methods. Despite the lack of friendly docs on these objects. Mainly because I could organise the "profiles" I was searching for much better in my code but also I wasn't sure if screen.find does a full screen grab before applying the region and as such the performance is much worse than using the VisionAdapter.grabScreenRegion. I didn't test if there was a performance difference though

Also, might impact architectural things, I have a bug I need to document and get raised. One of my automation projects dies after a few hours and screen.find fails on its internal .captureScreen call. I believe its something related to memory/GC perhaps some weak references and singleton funny stuff going on. I will get that bug raised and try to get a demo project together so you can reproduce. I have the stack trace for now but haven't been able to fully investigate. Right now I just reinitialise the entire app to work around it