Closed ciozi137 closed 1 year ago
Clues: https://sourceforge.net/p/labview-zmq/discussion/general/thread/f88f2224/
"Race conditions can cause crashes because it violates a precondition of the underlying zeromq library. ZMQ is not thread-safe and LabVIEW is inherently multithreaded. That's why it's important not to split context or socket wires wherever possible. My library includes a system for tracking which objects are in use and attempts to clean them up - when the project is closed that's when this system fires and cleans everything up. Basically once an assertion failure occurs it is impossible to handle the error and the whole system is shut down by the OS. I haven't encountered this myself so it would depend on your implementation; "socket operation on non-socket" certainly implies this is the case."
This error might be due to calling Remote Client.vim without passing class out to class in of next loop iteration (each time Remote Client.vim was called it would create a new socket connection). Testing now
nope...
Date Time Type Name Code Message Possible reason(s) Call Chain Calling Class
2022-08-31 17:01:33.523 Err 1097 Error 1097 occurred at zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002 LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002
RemoteControl.ZMQ.lvclass:Send Message.vi:2810001
RemoteControl.lvclass:Process.vi:5310001
SMO.lvclass:LaunchProcess.vi:3770001
SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C RemoteControl.ZMQ.lvclass:RemoteControl Server
2022-08-31 17:01:34.112 Err 1097 Error 1097 occurred at zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002 LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_socket.lvclass:zmq_send.vi:2620002
RemoteControl.ZMQ.lvclass:Send Message.vi:2810001
RemoteControl.lvclass:Process.vi:5310001
SMO.lvclass:LaunchProcess.vi:3770001
SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C instrument.PPMS.lvclass:PPMS
2022-08-31 17:01:35.455 Err 1097 Error 1097 occurred at zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003 LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003
RemoteControl.ZMQ.lvclass:Open Server Connection.vi
RemoteControl.lvclass:Process.vi:5310001
SMO.lvclass:LaunchProcess.vi:3770001
SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C RemoteControl.ZMQ.lvclass:RemoteControl Server
2022-08-31 17:01:35.457 Err 1097 Error 1097 occurred at zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003 LabVIEW: (Hex 0x449) An exception occurred within the external code called by a Call Library Function Node. The exception might have corrupted the LabVIEW memory. Save any work to a new location and restart LabVIEW. zeromq.lvlib:zmq_context.lvclass:zmq_ctx_create.vi:5610003
RemoteControl.ZMQ.lvclass:Open Server Connection.vi
RemoteControl.lvclass:Process.vi:5310001
SMO.lvclass:LaunchProcess.vi:3770001
SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.9A40009C instrument.PPMS.lvclass:PPMS
Inside RemoteControl:Process I notice that Read and Write Message states do not have continuity of the class wire. If these methods need to reconnect then there could be a runaway creation of sockets
91:??? in Instrument UI.PPMS.lvclass:Process.vi:5310001 ->SMO.lvclass:LaunchProcess.vi:3770005 ->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.E280003A
And then:
I'm starting think that there is a mismatch between the data written by get all and the received by the PPMS UI
Reproduced the error inside the LV dev environment (previous errors mostly occurred in the exe)
code: 91: source: SubVI in Instrument UI.PPMS.lvclass:Process.vi:5310001->SMO.lvclass:LaunchProcess.vi:3770005->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.317000F0 state: 16:47:46.3 Action: RC Send
When I inspect the events I get:
code: 91: source: SubVI in Instrument UI.PPMS.lvclass:Process.vi:5310001->SMO.lvclass:LaunchProcess.vi:3770005->SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.4DC000F1 state: 21:20:04.0 Action: RC Send
code: 156384721
zeromq.lvlib:zmq_socket.lvclass:zmq_close.vi
Complete call chain: zeromq.lvlib:zmq_socket.lvclass:zmq_close.vi RemoteControl.ZMQ.lvclass:Read Message.vi:2060002 RemoteControl.lvclass:Send and Receive Message.vi:1980001 RemoteControl.lvclass:Send and Receive.vi:4620001 Instrument UI.PPMS.lvclass:Process.vi:5310001 SMO.lvclass:LaunchProcess.vi:3770005 SMO.lvclass:LaunchProcess.vi.ACBRProxyCaller.4DC000F1
Again!.... Running latest PPMS Monitor and Control
Code:
There appears to be some issue with the instrument client, remote control client, remote control server, etc. when using the zmq example lazy pirate client and server I can send and receive more than 5,000,000 messages with no issues. I propose the following tests:
...hmm I was able to reproduce issue with shipping example of lazy pirate. Perhaps it's an issue with my virtual machine, or the labview installation is corrupt?
I am repeatedly opening and closing the context and the socket:
restart labview (and set wait to 0 ms):
Running Lockin_time on Astartes VM:
After running fine for months this error started to appear on Kholek (MNK DAQ). I was unable to find the source and resorted to uninstalling all N related software. After reinstalling now the PXI chassis is recognized (a separate error). Eventually will find out if the ZeroMQ error was due to the labview installation being corrupt, or if there is some issue still lurking inside the remote control SMO.
Explore these fixes posted on https://gpackage.io/packages/zmq-socket-library
Here is the repository (hosted GitLab): http://bluecogsoftware.com:8443/mac671/zmq-binding
Issues to check
* [x] split_enpoint.vi parse port number
this is fixed as far as I can tell
- [ ] separate compiled code some are not separated.
- [ ] zmq binding libraries published with debug enabled, occasionally causing assertion errors this is true. some still have debug enabled
- [ ] change priority of zmq_errno, _geterr, _libpath to inline _errno is good. _libpath needs to be inlined. I can't find _geterr
* [x] split_enpoint.vi parse port number
this is fixed as far as I can tell
* [ ] separate compiled code
some are not separated.
separated all
* [ ] zmq binding libraries published with debug enabled, occasionally causing assertion errors
this is true. some still have debug enabled
disabled all debugging and automatic error handling
* [ ] change priority of zmq_errno, _geterr, _libpath to inline
_errno is good. _libpath needs to be inlined. I can't find _geterr _libpath cannot be inlined: it has "this VI" references
build zmq v3.6.3.113 with above changes
Unclear if this is a Instrument Framework issue or specific to the Lockin (running in simulation mode...)
Restarting LabVIEW fixes the issue (until it happens again)...PPMS Monitor and Control nor Multichannel Lock-In do NOT need to be restarted.