jeffbass / yin-yang-ranch

Distributed Computer Vision Software & Raspberry Pis to help manage a farm
MIT License
88 stars 21 forks source link

Interfacing the imagehub to the librarian #4

Open sbkirby opened 3 years ago

sbkirby commented 3 years ago

Jeff, I enjoyed your PyCon 2020 Talk about the Yin-Yang-Ranch. Thank you for all the hard work developing imagenode, imagehub and imagezmq. I've been spending some time building a Distributed Computer Vision system at home using your tools, and I'm at the point where I would like to learn more about how you integrated (or plan to integrate) the librarian into the system. You put a huge two-way arrow between the imagehub and the librarian on your illustration. Would you explain the operation you had in mind for that portion of your overall design? Thanks

jeffbass commented 3 years ago

Hi @sbkirby,

I’m glad you enjoyed my PyCon 2020 talk. My vision for the librarian has evolved over time. In 2018, I designed and wrote a first draft. I ran that first version for over a year (the text message exchange in the README.rst is a screenshot from the first draft of the librarian). I ended up throwing the first draft away and starting over.

In my first draft, the librarian was sending return messages to the imagehub and then to the imagenodes using imageZMQ REP messages. It didn’t work well. I learned (and continue to learn) from the redesign and rewrite. In my current librarian design, the “two-way arrow” is a one way arrow from the imagehubs to the librarian. All the communication from imagehubs to the librarian is done through the event log and image files written by each imagehub. The librarian "receives" this communication by reading those files and processing the images and event messages. As each imagehub updates these files, the librarian watches for and grabs any new images or event messages.

The way the librarian communicates with the imagenodes is also different than the diagram. All the images and event messages go one way from imagenodes to imagehubs. The imagehubs do nothing except rapidly store received images and event messages. The librarian communicates "imagenode commands" directly to imagenodes. When the librarian gets a request like “send 3 images from the water meter right now”, it just changes the imagenode.yaml file residing on the appropriate water meter imagenode and restarts the imagenode. The imagenode restarts and does what the new imagenode.yaml specifies (e.g., send 3 immediate images, or turn off a camera detector, etc.). The librarian only communicates with the imagenodes through their yaml files now. It made the implementation of communications much simpler. I should re-draw the design diagram to reflect this, but I haven’t yet. For starting and restarting the imagenodes, I am using the Linux systemd / systemctl system to launch and restart the imagenode.py program. An example of the imagenode.service file is in the imagenode GitHub repository. This is working well for me. Trying to add complex 2 way communication protocols to the imagenode and imagehub programs was difficult and fragile. In my current librarian design, each time I need to implement a new imagenode “control task”, I think of a way to implement that feature by adding a new option to the imagenode.yaml option list. I then implement the new option in the imagenode code. The librarian communication of commands to the imagenodes is a sequence of "ssh exec imagenode restart” calls sent directly to each specific imagenode. These ssh calls copy the specific imagenode.some_command.yaml to imagenode.yaml, then restart the imagenode. To change it back to the previous imagenode.yaml, it performs a second copy and restarts again. All the various imagenode.some_command.yaml files reside in the home directory on each imagenode, so “sending a command to an imagenode” looks like this (each command is issued via ssh pi@rpi to the specific imagenode that needs to execute a command):

cp imagenode.no-motion.yaml imagenode.yaml  # the no-motion yaml file is one WITHOUT a motion detector option
sudo systemctl restart imagenode.service
# the above commands set the imagenode to “stop motion detection”

# the below commands set the imagenode to “start motion detection”
cp imagenode.motion.yaml imagenode.yaml  # the motion yaml file is one WITH a motion detector
sudo systemctl restart imagenode.service

These command sequences are in yaml files in the librarian_data directory; they are read by the librarian and sent as ssh commands to the specific imagenode via Python’s subprocess.run() module. It is necessary to configure password-less SSH on each RPi using ssh-keygen. Each restart of the imagenode program on a RPi takes about 3 seconds; that is fast enough for my applications.

Here's what my current draft of librarian does in pseudo code:

 # stuff done one time at program startup
  Read a  librarian YAML file into Settings (imagehub file locations, gmail credentials, backup schedule, etc.)
  Instantiate a Librarian using Settings:
    Instantiate all CommChannel's (includes CLI, Gmail, etc.)
    Do a system self check and log the result
    Start threads and subprocesses:
      Start a watching thread for each comm channel (e.g., gmail_watcher.py)
      Start a watching thread that processes new data as it is saved by imagehubs
      Start a scheduling thread that runs scheduled tasks like backups
      Start a HealthMonitor subprocess that watches the librarian, imagehubs and imagenodes for issues

  # The main Librarian forever loop:
  Response Loop forever:
    Receive a QueryRequest (from in_q which is loaded by an inbound CommChannel thread)
    Optionally, take an action (e.g., set barn_cam to no_motion)
    Compose a Reply (using data that was processed from the imagehub files)
    Send Reply (put into out_q which will be sent in an outbound CommChannel thread)

The librarian communications, image skills, data skills and other modules are located in a "helpers" directory. Here are the main Librarian Classes::

  Class Settings (filled once by reading from the YAML file)
  Class HealthMonitor (methods that monitor system and network health; does backups)
  Class Librarian (instantiated once using Settings)
    Class CommChannel (instantiated once per comm channel like Gmail or CLI)
      Class Gmail (instantiated once; fetches and sends messages via Gmail API)
    Class Chatbot (instantiated once; receives queries and composes replies)
    Class HubData (instantiated once; provides HubData, e.g. events, to ChatBot)
    Class Schedule (instantiated once; starts scheduled tasks and methods)
    Class Imagehub (instantiated once per imagehub; reads imagehub saved data)
    Class Imagenode (instantiated once for each imagnode)
    Class Location (will instantiate as many locations as imagenodes)
    Class ImagenodeCommand (instantiated each time a specific command is sent)
    Class ObjectDetector (instantiated for each image and imagenode type)
    Class WaterMeterReader (instantiated once; reads water meter digits from water meter image)
    Class Person (can instantiate one or more Persons to answer queries from)

Separate helper programs interact with the librarian; here is one example; there are about a dozen of these helper programs so far:

My goal is to have my current version of the librarian in a GitHub repository by December or January so that I can propose a talk about it for PyCon 2021.

Thanks for your interest. Good luck with your own Distributed Computer Vision system! I look forward to seeing anything you might choose to share on GitHub. Jeff

P.S. I pushed a new version of imagenode to GitHub recently; it added an option for threaded image sending capability. It speeds up image sending by about 30%.

sbkirby commented 3 years ago

Hey @jeffbass,

Thanks for the thorough description of your librarian plans. You mentioned the following imagehub and librarian communications:

All the communication from imagehubs to the librarian is done through the event log and image files written by each imagehub.

How are you accomplishing the "image files" monitoring? What does the thread below do?

Start a watching thread that processes new data as it is saved by imagehubs

I discovered the tools.imaging.ImageNode.process_hub_reply code you added to imaging.py and implemented in imagenode.py. Based on the following statement, I'm not sure if this function will disappear, or become another tool for controlling the imagenodes "directly".

The librarian communicates "imagenode commands" directly to imagenodes. When the librarian gets a request like “send 3 images from the water meter right now”, it just changes the imagenode.yaml file residing on the appropriate water meter imagenode and restarts the imagenode.

I'm at the point in my project where the librarian is the next step, and I'm exploring the boundaries of your overall plan in order to determine how I can create a version which will falls within these boundaries. Communications between the imagehub and the librarianwas my major concern.

Thanks for the heads up on the newest version of imagenode. I installed it yesterday, and I also turned on notification for any updates.

Thanks, and stay safe.

jeffbass commented 3 years ago

Hi @sbkirby, In imagenode, the process_hub_reply function doesn't do anything (just pass). It is unlikely to change, since I am now controlling imagenodes by changing the imagenode.yaml file and restarting the imagenode program using systemctl restart imagenode. I may take it out completely in a future version, but for now, I'm leaving the stub function there.

The way the librarian watches for changes in the event log and image files is by using the Linux tail command every few seconds on the imagehub.log file. As events occur, such as motion detection, 2 things happen: 1) an event message is sent and then 2) one or more images are sent. The imagehub stores these as they arrive. The important fact is that anytime an image is added, or anytime a temperature changes, etc. in any imagenode, it causes new lines to be written to imagehub.log. The librarian HubData class uses the Linux tail command to look at the last few lines of imagehub.log; if it has changed, librarian updates its own data accordingly. The imagehub.log is the only file that is watched in this way as that is enough information to know if there were new images written, etc. There are multiple imagehub.log files; it is a "rotating log"; imagehub.log is the newest, then imagehub.log.1 is the next newest, etc. See the imagehub code. My current imagehub log settings allow for 99 logs with about 2,000 lines each.

I use the Linux tail program via subprocess.run(). It is a very fast utility for grabbing the last lines of a text file; I couldn't write anything that fast in Python.

I have posted the librarian HubData class as a GitHub Gist here. It illustrates how the librarian continuously watches the imagehub.log for changes. Any recently changed log lines are stored in the librarian in a dictionary that is queried by the librarian ChatBot to answer questions like "is the water running?". The events dictionary is filled with the most recent 1,000 events when the librarian starts up. The code in the Gist is a bit messy since it is a draft; I apologize if it is a bit hard to read; my librarian code is not cleaned up or fully documented yet. But this gives you a draft view of how the librarian watches for new events. This HubData class is currently running in a librarian that is running on a the same computer that an imagehub is running on. This version of the code doesn't yet have logic for watching the imagehub.log on a different computer. I use thread locking to keep the fetch_event_data method used by the ChatBot thread from accessing the event_data dictionary while the load_log_event is putting new log events into it.

Hope this helps, Jeff

sbkirby commented 3 years ago

Hey Jeff, I was happy to see the process_hub_reply stub will probably remain for the time being.

Meanwhile, I was excited to see your technique for monitoring events and image files on the imagehub. I have several years experience of programming in a Windows environment, and thanks to RPi's I am enjoying the switch to Linux. Therefore, I wasn't familiar with tail, but was pleased to learn how simple this technique was. After reading your recent response, I began adapting my new IOT Server to implement this technique. Unfortunately, I've been struggling for the last day and half trying to get this thing to work. It has absolutely nothing to do with your software. The server I built uses Docker Containers, a first for me, and one particular feature/problem with these containers is the process within the container doesn't have access to any folders/directories on the local host unless you specifically mount them as a "volume". Up until now this hasn't been a problem. I took an OpenCV container and upgraded the Image to implement OpenALPR, and customized it without any major problems. This is the container I use to run imagehub. But, when it comes to Node-Red, which I've used previously in projects (not in a Docker Container), it doesn't seem to support this feature in its entirety. The only outside volume mapping they seem to recognize is a mapping to their "/data" folder. Anyway, so much for the rant. I was hoping to implement a tail interface, but I've been frustratingly delayed. As a result, I haven't had a chance to look at your librarian HubData class yet...Thank you for sharing. As soon as I get a chance, I will look at it, and give you some feedback.

Thanks again Stephen

sbkirby commented 3 years ago

Howdy Jeff, I finally integrated my imagehub server into my Node-Red Docker container. As I mentioned previously, it was necessary to move the imagehub_data directory to the Node-Red's \data directory. It's an ugly fix, but it seems to be the only solution available at this time. In any case, I'm using tail to catch all of the imagehub events, and it's working great...Thanks. After monitoring the log file for a while, and examining the HubData class I'm curious what information you are storing, and how are you storing it (e.g. MySQL, InfluxDB, etc.)? Addionally, how long are you keeping some of this data...especially the images?

jeffbass commented 3 years ago

Hi Stephen,

I’m glad you got the tail routine working.

The main information I am storing is the imagehub_data itself: event logs as simple text files and jpg images (organized the the imagehub into nested directories named with dates). This is all of my primary data. When doing analysis of things like water usage or detecting coyotes in images, I have data analysis programs that read these primary data sources as needed. I don’t currently use any database, but rather filter and use the primary data “on the fly” as needed. I do store some simple summary text files resulting from the data analysis, but just as simple text files without using any database. An example of this is a simple text file where each line contains datetimes and water usage in gallons. When searching for a specific image or event, I use the event logs to find a date and then retrieve the relevant image directly from the imagehub_data (typically I do this on a copy of the imagehub_data that is kept and updated on an analysis computer). That’s a short and simple answer. I give a lot more detail in the paragraphs that follow (but feel free to skip reading them if this short answer is enough for you ;-)

So far, 3 years into my project I am keeping all of my primary data. It is less than 150GB in size. I have images going back about 30 months and logs going back about 3 years. I currently have 3 imagehubs, each of which is a cheap laptop that I have set up to run Debian Linux. The laptop SSDs are 128GB or larger. My largest imagehub_data directory is about 50GB. So I haven’t had to throw away any images yet. I back up the imagehub laptop SSDs to an analysis computer using the linux utility “rsync”, which is very smart about copying only the data that has changed. I keep all the data in its original form and location on the imagehubs. My librarian prototype leaves all the data untouched as it is (text files for event logs are in imagehub_data/logs and the images in imagehub_data/images). I don’t currently use any database. I use the HubData tail technique for the librarian to answer queries about current and recent events using a librarian prototype running on each imagehub computer. My current setup is a bit different than the design diagram; currently there is one librarian.py program running on each imagehub computer. Only 1 of the librarian programs answers queries via SMS texting or a CLI interface. It can fetch data from the other imagehub / librarian computers via CLI queries and the Linux scp utility.

I use a different computer (with an rsync copy of the data from an imagehub) to do analysis / computation / object detection, etc. I have a simple script that makes an imagehub_data rsync copy from the imagehub laptops to the analysis computer at regular intervals. The analysis computer uses these imagehub_data copies so that it is not keeping the imagehub laptop CPUs busy with analysis tasks. I have found that each imagehub computer needs to be doing nothing but running the imagehub and the librarian query response programs in order to keep up with 8-10 imagenodes per imagehub. My data analysis includes computing historical summaries (how much water was used per hour on a certain day, etc). I have not yet built the librarian classes to answer queries about event history, so I have a collection of various short Python programs that I run to query history. Right now I am storing analysis files as simple tables or as Pandas data frames. Both are stored as csv delimited text files. I haven’t decided on a database since all my secondary analysis flies are simple csv text files that I load and save as needed. My object detection and other computer vision prototype programs are all run on the analysis computer. I use tensorflow and keras to build and test models for object detection on my image files. Keras can use the HDF5 Hierarchical Data Format, which can be written and read by Pandas. However, since TensorFlow 2.0 was released, I’ve just been using the recommended TensorFlow SavedModel format . I keep a spreadsheet with an annotated list of my models for object detection, etc. I may put this into a database someday, but that has not been a priority; the spreadsheet is working just fine for me.

My current prototype librarian is a collection of small python programs that do specific small tasks. I tend to chain these together via standard input / standard output pipes per the “Unix Philosophy” (article about that here ). I have been an avid unix user since 1979, so it is a toolset I am very familiar and comfortable with. After 40 years of unix and C coding, I started using Python about 3 years ago and I love it. I use many “quick and dirty” short Python programs for different kinds of analysis of the imagehub_data log and image files. These Python programs aren’t documented yet, so I don’t feel they are ready to post on GitHub as part of a Librarian repository. Here is an example of one of those short Python programs. It runs in the imagehub_data/images directory and prints the number of images by date. It reads the imagehub directories using the Linux ls utility. Here is the program's output when I ran it a few minutes ago.

(py3cv3) jeffbass@jeff-thinkpad:~/imagehub_data/images$ python list_num_images_by_date.py 
2020-07-28 1561
2020-07-29 2128
2020-07-30 1168
2020-07-31 989  # I ran this at noon-ish; there will likely be about 2,000 images in this directory by midnight
(py3cv3) jeffbass@jeff-thinkpad:~/imagehub_data/images$ 

The primary way I save disk space is by minimizing the number of images that are sent and stored. The design of my overall system sends a minimum number of images from the imagenodes to the imagehub. For example the motion detector on the water meter sends a few frames when the water meter starts spinning and a few frames when it stops spinning. That is typically only a few dozen images a day (and a few dozen event messages a day in the imagehub_data/logs). When the Librarian is sent a query about water, it uses the HubData fetch_event_data() method to grab the last few event messages in the log, do a bit of simple processing and send the result as a reply to the query. That’s how the “text messaging” in the README of this GitHub repository was created. Here is a snapshot of my text message exchange with “Susan”, my SMS chat librarian, from a few minutes ago. Yes, it is a very hot day in Southern California today!

Chat_text_exchange

Another way I save disk space is to use the smallest image size that can accomplish a given task. For example, the PiCamera aimed at my water meter dial takes 320x240 images which are only about 15KB per image when sent by imageZMQ as a jpg compressed image and when stored by the imagehub as a jpg image file. But these images are big enough to use computer vision routines to read the digits of the water meter and store those readings in a simple text file that looks a lot like the imagehub log files (one datetime and water meter reading integer per line). I haven’t felt the need to use a database because these simple text files are working just fine for my analysis using Pandas. I have my wildlife cams send larger images, but even for the wildlife cams, 640x480 is large enough to run my animal detection computer vision algorithms using TensorFlow.

This way of doing things without a database is a working draft and an ongoing experiment. It makes the Librarian main program very short and simple; it is about 60 lines; a “forever loop” that watches for incoming queries. It sends them to the ChatBot (about 350 lines of Python). The ChatBot uses HubData (about 350 lines of Python) via method fetch_event_data() to get just enough text data to answer a query. The Librarian main program is in this Gist. The HubData class is in this Gist. An example method of the ChatBot class that reports temperatures is in this Gist. It shows how the HubDatafetch_event_data() method grabs the current and previous event messages for a specific imagenode’s temperature sensor.

I hope this helps you understand what information I am storing and how I am storing it.

Feel free to continue adding additional questions in this this issue thread. Or if you have questions about a different topic, feel free to open a new issue. Answering your question made me realize how much documentation I still need to write to get the Librarian ready to be pushed to a GitHub repository. I am considering posting a smaller Librarian subset of classes that are already documented just to have something publicly available and visible. Answering your question was a great way for me to revisit how much of the Librarian is actually ready (and, conversely, how much is not ;-)

Thanks, Jeff