Closed Reiss-Cashmore closed 3 years ago
Alternatively it could be nice to integrate something more general like Tensorflow.js which could allow for an implementation where users are free to run any TF model within the context of Nut.Js. Might even be worth having an abstraction/interface so that libraries like Tesseract, TF and others can be swapped between easily and have consistent outputs
A general implementation could mean lots of support for things like: Text recognition / OCR Face detection Barcode scanning Image labeling Object detection & tracking Language identification / Translation Smart Reply / Conversation aware responses
As there are a lot of great free and open-source optimised models for such tasks built with Tensorflow and Tensorflow Lite.
Some good links https://www.tensorflow.org/js https://www.npmjs.com/package/@tensorflow/tfjs-node https://www.npmjs.com/package/@tensorflow/tfjs-node-gpu https://blog.tensorflow.org/2021/09/blog.tensorflow.org202109optical-character-recognition.html https://blog.tensorflow.org/2020/01/run-tensorflow-savedmodel-in-nodejs-directly-without-conversion.html
Some pre-canned Tensorflow Lite Models that have been converted and optimised from bigger more established libraries: https://github.com/tulasiram58827/ocr_tflite https://github.com/tensorflow/tfjs-models/tree/master/tasks https://tfhub.dev/sayakpaul/lite-model/east-text-detector/fp16/1
Alternatively a couple of other possible libraries for considerations: https://github.com/dbashford/textract https://github.com/antimatter15/ocrad.js
I closed #67 as I have a working OCR module in place. It’s not yet released and I’m currently making some decisions on how I want to proceed.
At the moment I’m rethinking a few architectural things as well, therefore I added #259 to the 2.0 milestone, as some things might change most likely.
Ahh okay awesome that you have an implementation @s1hofmann !
No worries, if you need any help on anything just drop me a line. I'm pretty comfortable with the Node/JS side of things.
Something worth mentioning architecture wise perhaps is that I found myself putting together the setup of TemplateMatchingFinder, VisionAdapter and MatchRequest myself rather than using screen.xxxxx methods. Despite the lack of friendly docs on these objects. Mainly because I could organise the "profiles" I was searching for much better in my code but also I wasn't sure if screen.find does a full screen grab before applying the region and as such the performance is much worse than using the VisionAdapter.grabScreenRegion. I didn't test if there was a performance difference though
Also, might impact architectural things, I have a bug I need to document and get raised. One of my automation projects dies after a few hours and screen.find fails on its internal .captureScreen call. I believe its something related to memory/GC perhaps some weak references and singleton funny stuff going on. I will get that bug raised and try to get a demo project together so you can reproduce. I have the stack trace for now but haven't been able to fully investigate. Right now I just reinitialise the entire app to work around it
Short overview I was wondering if I might be able to help with the OCR feature for this lib Ref: https://github.com/nut-tree/nut.js/issues/67
I think it would be a great addition and useful for me personally. I don't mind helping with the integration if I can understand your approach and architecture? Tesseract looks like a good option
I can also help with getting the colour at a pixel location feature. It looks like it just needs integrating from Robot.js