levylabpitt / Instrument-Framework

An object-oriented framework for LabVIEW based on JKI SMOs.
BSD 3-Clause "New" or "Revised" License
4 stars 7 forks source link

ZMQ Socket Operation on Non-Socket #80

Closed ciozi137 closed 1 year ago

ciozi137 commented 1 year ago

image

Unclear if this is a Instrument Framework issue or specific to the Lockin (running in simulation mode...)

image

Restarting LabVIEW fixes the issue (until it happens again)...PPMS Monitor and Control nor Multichannel Lock-In do NOT need to be restarted.

ciozi137 commented 1 year ago

Clues: https://sourceforge.net/p/labview-zmq/discussion/general/thread/f88f2224/

"Race conditions can cause crashes because it violates a precondition of the underlying zeromq library. ZMQ is not thread-safe and LabVIEW is inherently multithreaded. That's why it's important not to split context or socket wires wherever possible. My library includes a system for tracking which objects are in use and attempts to clean them up - when the project is closed that's when this system fires and cleans everything up. Basically once an assertion failure occurs it is impossible to handle the error and the whole system is shut down by the OS. I haven't encountered this myself so it would depend on your implementation; "socket operation on non-socket" certainly implies this is the case."

ciozi137 commented 1 year ago

This error might be due to calling Remote Client.vim without passing class out to class in of next loop iteration (each time Remote Client.vim was called it would create a new socket connection). Testing now

ciozi137 commented 1 year ago

nope...

ciozi137 commented 1 year ago

image

ciozi137 commented 1 year ago
Date    Time    Type    Name    Code    Message Possible reason(s)  Call Chain  Calling Class
2022-08-31  17:01:33.523    Err     1097    Error 1097 occurred at zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002  LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002
     RemoteControl.ZMQ.lvclass:Send Message.vi:2810001
     RemoteControl.lvclass:Process.vi:5310001
     SMO.lvclass:LaunchProcess.vi:3770001
     SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C  RemoteControl.ZMQ.lvclass:RemoteControl Server
2022-08-31  17:01:34.112    Err     1097    Error 1097 occurred at zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002  LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002
     RemoteControl.ZMQ.lvclass:Send Message.vi:2810001
     RemoteControl.lvclass:Process.vi:5310001
     SMO.lvclass:LaunchProcess.vi:3770001
     SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C  instrument.PPMS.lvclass:PPMS
2022-08-31  17:01:35.455    Err     1097    Error 1097 occurred at zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003   LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003
     RemoteControl.ZMQ.lvclass:Open Server Connection.vi
     RemoteControl.lvclass:Process.vi:5310001
     SMO.lvclass:LaunchProcess.vi:3770001
     SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C  RemoteControl.ZMQ.lvclass:RemoteControl Server
2022-08-31  17:01:35.457    Err     1097    Error 1097 occurred at zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003   LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003
     RemoteControl.ZMQ.lvclass:Open Server Connection.vi
     RemoteControl.lvclass:Process.vi:5310001
     SMO.lvclass:LaunchProcess.vi:3770001
     SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C  instrument.PPMS.lvclass:PPMS
ciozi137 commented 1 year ago

Inside RemoteControl:Process I notice that Read and Write Message states do not have continuity of the class wire. If these methods need to reconnect then there could be a runaway creation of sockets

ciozi137 commented 1 year ago

image

ciozi137 commented 1 year ago

image

91:??? in Instrument UI.PPMS.lvclass:Process.vi:5310001 ->SMO.lvclass:LaunchProcess.vi:3770005 ->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.E280003A

And then: image

I'm starting think that there is a mismatch between the data written by get all and the received by the PPMS UI

ciozi137 commented 1 year ago

Reproduced the error inside the LV dev environment (previous errors mostly occurred in the exe)

code: 91: source: SubVI in Instrument UI.PPMS.lvclass:Process.vi:5310001->SMO.lvclass:LaunchProcess.vi:3770005->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.317000F0 state: 16:47:46.3 Action: RC Send

ciozi137 commented 1 year ago

When I inspect the events I get:

code: 91: source: SubVI in Instrument UI.PPMS.lvclass:Process.vi:5310001->SMO.lvclass:LaunchProcess.vi:3770005->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.4DC000F1 state: 21:20:04.0 Action: RC Send

ciozi137 commented 1 year ago

code: 156384721

zeromq.lvlib:zmq_socket.lvclass:zmq_close.vi

Complete call chain: zeromq.lvlib:zmq_socket.lvclass:zmq_close.vi RemoteControl.ZMQ.lvclass:Read Message.vi:2060002 RemoteControl.lvclass:Send and Receive Message.vi:1980001 RemoteControl.lvclass:Send and Receive.vi:4620001 Instrument UI.PPMS.lvclass:Process.vi:5310001 SMO.lvclass:LaunchProcess.vi:3770005 SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.4DC000F1

ciozi137 commented 1 year ago

Again!.... Running latest PPMS Monitor and Control image

Code: image

ciozi137 commented 1 year ago

There appears to be some issue with the instrument client, remote control client, remote control server, etc. when using the zmq example lazy pirate client and server I can send and receive more than 5,000,000 messages with no issues. I propose the following tests:

ciozi137 commented 1 year ago

...hmm I was able to reproduce issue with shipping example of lazy pirate. Perhaps it's an issue with my virtual machine, or the labview installation is corrupt?

ciozi137 commented 1 year ago

image

I am repeatedly opening and closing the context and the socket: image

restart labview (and set wait to 0 ms): image

ciozi137 commented 1 year ago

Running Lockin_time on Astartes VM: image

ciozi137 commented 1 year ago

image image

After running fine for months this error started to appear on Kholek (MNK DAQ). I was unable to find the source and resorted to uninstalling all N related software. After reinstalling now the PXI chassis is recognized (a separate error). Eventually will find out if the ZeroMQ error was due to the labview installation being corrupt, or if there is some issue still lurking inside the remote control SMO.

ciozi137 commented 1 year ago

Explore these fixes posted on https://gpackage.io/packages/zmq-socket-library

image

Here is the repository (hosted GitLab): http://bluecogsoftware.com:8443/mac671/zmq-binding

ciozi137 commented 1 year ago

And https://sourceforge.net/p/labview-zmq/code/merge-requests/1/

image

ciozi137 commented 1 year ago

Issues to check

ciozi137 commented 1 year ago
* [x]  split_enpoint.vi parse port number

this is fixed as far as I can tell

  • [ ] separate compiled code some are not separated.
  • [ ] zmq binding libraries published with debug enabled, occasionally causing assertion errors this is true. some still have debug enabled
  • [ ] change priority of zmq_errno, _geterr, _libpath to inline _errno is good. _libpath needs to be inlined. I can't find _geterr
ciozi137 commented 1 year ago
* [x]  split_enpoint.vi parse port number

this is fixed as far as I can tell

* [ ]  separate compiled code

some are not separated.

separated all

* [ ]  zmq binding libraries published with debug enabled, occasionally causing assertion errors

this is true. some still have debug enabled

disabled all debugging and automatic error handling

* [ ]  change priority of zmq_errno, _geterr, _libpath to inline

_errno is good. _libpath needs to be inlined. I can't find _geterr _libpath cannot be inlined: it has "this VI" references

ciozi137 commented 1 year ago

build zmq v3.6.3.113 with above changes