jeffbass / imagezmq

A set of Python classes that transport OpenCV images from one computer to another using PyZMQ messaging.
MIT License
1.01k stars 160 forks source link

How to re-eastablish connection? #49

Open jaromrax opened 3 years ago

jaromrax commented 3 years ago

Hello, I have seen the project long ago, only now I tried.

However, I found, that if imagehub (receiver) is restarted, the sender is stalled. Probably waiting the 'OK' response. Is there a way to timeout the sender?

Pointing to an IP from the client and shooting an image is very convenient, but the interruption means a kill and restart on client...

thank you jaro

jeffbass commented 3 years ago

Hi jaro @jaromrax,

Thanks for your question. One disadvantage of the ZMQ REQ/REP messaging pattern is that the sender needs to restart if the imagehub (receiver) is restarted. This is expected ZMQ behavior and is mentioned in the ZMQ documentation. I use the REQ/REP pattern in my own production systems and REQ/REP does require a timeout watch method in the sender program. I mention one sender timeout technique using the Linux SIGALRM signal in this imageZMQ FAQ here. I mention a couple of others below.

There are other ways a REP timeout watcher could be implemented; some “more robust REQ/REP patterns” are discussed in the ZMQ documentation here.

In my own imagenode programs, I restart the sender program when the imagehub program is restarted. I use 2 different timeout techniques. Both of these techniques are in my imagenode GitHub repository: 1) using a try except block for each REQ that sets a timeout using signal.SIGALRM; this code uses a Patience class and is in the main imagenode branch here, lines 38-45. and 2) appending the precise time of each REQ sent and each REP received to a deque and using a separate timeout watching method running in a thread. This is in a test imagenode stall_watcher branch here, lines 262-293 & lines 317-350. I am developing the alternative to my Patience class for 2 reasons: signal.SIGALRM does not work in Python threads and 2) signal.SIGALRM does not exist in Windows which was pointed in this imagenode issue by a Windows user. My own production systems have been running for over 2 years and I have found that timeout watching of REP sends is needed because of network and power glitches more often than for imagehub restarts. But a timeout watcher is definitely needed.

One of the imageZMQ users (Pat Ryan @youngsoul) developed a different technique for a REP timeout watcher and ImageSender restart. I mention it in the Useful Forks section of the imagaZMQ README.rst. The direct link to the Pat’s code is here: Pat’s code uses signal.SIGALRM.

Note that you can close and restart the ImageSender (as Pat Ryan does) or you can restart the image sender program (as my own imagenode program does).

Thanks for your question. It is a good one, so I will put together a few examples of the above and add them to the imageZMQ examples folder in the next week or so. Please feel free to ask follow up questions as comments in this issue if you need more details before then.

Jeff

jaromrax commented 3 years ago

Thank you very much, for very elaborate and kind response, I am just in the middle of trying @youngsoul solution now. I have found it earlier than the references you point at. Do you think, there are some functional differences between the two approaches? Thanks Jaro

jeffbass commented 3 years ago

Hi Jaro, I haven't used Pat's solution yet, so I don't have specific feedback about it. I would love to hear your thoughts after you try it. I am currently leaning toward the method 2) that I mentioned above. I am a couple of weeks into testing it and it has minimal effects on latency and throughput. It puts the "watching for a REP after REQ" task in a separate Python thread, which fits well with my own imagenode project. Jeff

jaromrax commented 3 years ago

Dear Jeff, after some tweakin, I realized that I must remove the release of SIG in def timeout. After this, I can arbitrarily start and stop receiver and sender. However, I have seen a crash when starting one of the codes - twice. Jaro

jeffbass commented 3 years ago

Hi Jaro,

One thing you might want to try is set the zmq.LINGER option after each start or restart of the sender:

        sender = imagezmq.ImageSender(connect_to=hub_address)
        sender.zmq_socket.setsockopt(zmq.LINGER, 0)  # prevents ZMQ error on exit

That helped eliminate some restart errors for me. Jeff

jaromrax commented 3 years ago

Dear Jeff. I was trying to combine an older but great flask webcam example (of Adrian Rosebrock?) with the SIGALRM version of imagezmq. And I have hit the barrier you have mentioned - while the izmq req/rep works stable now, it works only in the main thread... The option 2/ you have mentioned may solve this...

jeffbass commented 3 years ago

Hi Jaro, I am testing my option 2 now and have been for about a month. It is working well on a dozen Raspberry Pi's. Give it a try and let me know how it works for you. Note that the option 2 code is in the stall_watcher branch of imagenode, not the master branch. When I have completed my testing, I will merge it into the master branch. Good luck! Jeff

jaromrax commented 3 years ago

Dear Jeff, sorry for not coming back for so long. That sounds great, I just tried to see the branches (it is long from October), but I see no other branches are in imagenode than master. Nor in imagezmq. Thank you

jeffbass commented 3 years ago

Hi Jaro @jaromrax, I've merged the branches that I referred to in previous comments. Here are 2 code examples that may help you:

  1. imageZMQ timeout & restart example program: timeout_req_ImageSender.py
  2. My "option 2" discussed above is now merged as an option inimagenode. The REP_watcher() runs in a separate thread and watches for a excessively long time of "REQ sent and no REP received". The optional REP_watcher send method send_jpg_frame_REP_watcher() appends send and receive times to a deque; see lines 271-367 in imagenode's imaging.py. You can learn more about the REP_watcher option in the imagenode YAML settings docs here.

I hope this helps. Let me know if you have other questions. Jeff