ESP32-CAM micropython Computer Vision (object detectoin and clasification) feature

boylucky commented 6 months ago

Hello, I would like to ask you if you ever thought to add Computer Vision with object detection and clasification to micropython. That would be great feature for many other project I think. But I am not sure if it is even possible with ESP32-CAM hardware. I know there are direct examples in Arduino IDE code style for ESP32-CAM where you can do face detection and clasification but did not found any possibility to have such thing directly in micropython. Possibility to add small tflite models generated from Teachable Machine of Google would be relly great. But like I said, I do not know if this hardware is capable of such deployment of tflite models. I have some experiences with tflite models generated from Teachable Machines used with Python on Raspberry Pi which make it a great option, but for smaller and simple projects which could even operate on bateries the ESP32-CAM could be interesting solution I think. It could operate not on a video stream detection but like a detection on photos taken every couple of seconds for example. What do you think about it?

shariltumin commented 6 months ago

Well, I don't think I'm going to go down that road. The ESP32 simply does not have the capability to work with something like that in a practical way. I have more faith in building a system of different hardware modules that work together.

I am currently playing around with LLM, specifically Ollama. Running ollama on a lightweight linux, for example a lubuntu with 32GB RAM and X disabled. All resources will be dedicated to running ollama. Connecting esp32 to the ollama server via it's web API and using llava model, I think this is the best path to go at the moment.

I will make a repo about my findings, when I have something to share.

It would be very nice if you could share your experiences with Teachable Machine and tflite. A repo with tutorials about it would be very interesting.

boylucky commented 6 months ago

Thanks for your comment. You are right with the ESP32-CAM capabilities. It is just interesting that there is a sample project in Arduino IDE for ESP32-CAM which is running real time face detection. But I think this capability is somehow build in directly in ESP board (sorry if I am wrong but that is my current understanding). Anyway for better reliability I was thinking similar way as you described. I would make a picture on teh ESP32-CAM and pass it to Raspberry Pi via specified port on which the RPI would listen for incoming pictures. Then the RPI would do the tflite model clasification and pass the result back to ESP. It would be of course not an instant streaming capability but would have enough capability to detect it on pictures taken every couple of seconds. But then I am missing the advantage of the ESP a bit, because then I have to feed with power 2 devices and definitely more power consumption. In such situation I would rather go the way direcly using the RPI for all the work. But that is what I already have :o) . I just wanted to meka it a bit more interesting by moving all the stuff to ESP. If I get further I will let you know, but I do not think so.

Your project also sounds very interesting, but definitely on much higher level then what I am trying to do. And about sharing of my experiences with Teachable Machine and tflite I plan to make some repo and would also like to make some site to share it, not just this project but also other projects like small printing bots, small working model of farming bot based on the real one, some experiences with home made CNC wood milling machine, some IoT stuff, 3D printing, gardening automation etc... But still did not get to it :o( Hopefull soon I will be able to do it. By the way my dream project is to make a real operate small farming bot on wheels (with no dig garden bads like Richard Perkins and others do) with the computer vision and taking care of vegetable and so one for a fraction of price of commercial solutions :o) With the current boom of the AI capabilities I am just curious why something like that already does not exist. At least I did not find anything like that except farm.bot but I do not like that idea much (but I made similar small model like this for kids to play with :o). But I am now writing quite a bit off topic :o)

shariltumin commented 6 months ago

There is a company in Germany https://zauberzeug.com that makes robotics for farming. They are also the makers of NiceGUI https://nicegui.io

You might get some inspiration from them.

I like their motto - fail often and early ... well, they practice agile and lean.

I wish you all the best.

boylucky commented 6 months ago

Thanks for the great tips especially with the nicegui.io That looks really great and it is what I was looking for some time ago as I am mostly using PyQT6 which does not have the web interface capability. And for the zauberzeug.com also looks interesting and profsional. I am more thinking about a machine which would be more in stily of first open source 3D printers.

boylucky commented 6 months ago

Can I ask you two more questions? Would you have any idea how to convert the jpeg picture taken by the camera to raw format of RGB565 or other formats? Of course on really small dimensions like 240x240 pixels to be able to acomodate it on the module. I know there will be colours precision lose. But for me it looks that the modul is more capable to take higher quality pictures in jpeg format then in other formats. I mean especially the correct light, brightnes etc. Maybe I am wrong but that is my current experience. I also use taking picture twice and use the second shot as it is always much better because of the automatic exposure parameters I guess. Do you think that there would be even some technics that could extract some part of the jpeg picture to other formats like RGB565 or others? I do not think so as it is compressed file format which is probably needed to convert completely and then read the parts.

Next question would be about the .bin files conversion. I mean are there some techniques which would be able to convert all used files on the project implemented on ESP32-CAM with micropython to a binary file which would then run when the module would be powered on? I know about the main.py which run automatically when the module is powered on. But is it possible to convert it to main.bin or something like that and would it contain all the other .py files (not the .txt or other files used in the project)? Do you know if it would then run faster if it is even possible? I use similar way in couple of projects in python on the Windows PC to prepare the .exe file for other users. That makes it much easier for them to run it as they do not need to install anything like other required modules and so one. Everything is packed in the .exe file. I only attach the .ini, .txt, .jpg and other files used in the project.

Thanks for your help and time.

shariltumin / esp32-cam-micropython-2022

ESP32-CAM micropython Computer Vision (object detectoin and clasification) feature #51