Closed hinxx closed 5 years ago
0x92 in latin1 is a ’ symbol. Is it accidentally in your source code and the file is not encoded in utf-8? Can you share your database definition?
I surely can, I have several of DBs riding on the single server instance, utilizing multiple drivers, each with its own port name and a set of PVs to handle.
I've copy/pasted each of the DB definitions below:
pv_db = {
# proton time frame array sizes PV
'PROTON:FRAME:NELM': {
'scan': 0,
'count': 1,
'type': 'int',
'value': pv_array_size
},
# proton count PVs with different time resolution
'PROTON:TOTAL': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:SECOND': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:MINUTE': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:HOUR': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:DAY': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
# proton count array PVs with different time resolution
'PROTON:SECONDS': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:MINUTES': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:HOURS': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:DAYS': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
# proton rate PVs with different time resolution
'PROTON:SECOND:RATE': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:MINUTE:RATE': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:HOUR:RATE': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
'PROTON:DAY:RATE': {
'scan': 1.0,
'count': 1,
'type': 'float',
},
# proton rate array PVs with different time resolution
'PROTON:SECONDS:RATE': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:MINUTES:RATE': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:HOURS:RATE': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
'PROTON:DAYS:RATE': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float',
},
}
pv_db = {
'SRV:TIME': {
'scan': 1.0,
'count': 40,
'type': 'char'
}
}
pv_db = {
# image
'COUNTER': {
'scan': 1.0,
'count': 1,
'type': 'int',
'value': 0
},
'IMAGE': {
'scan': 1.0,
'count': 40000,
'type': 'short',
},
'XSIZE': {
'scan': 1.0,
'count': 1,
'type': 'int',
'value': 0
},
'YSIZE': {
'scan': 1.0,
'count': 1,
'type': 'int',
'value': 0
},
# profile
'PROFILE': {
'scan': 1.0,
'count': 500,
'type': 'float',
},
'PROFILESZ': {
'scan': 1.0,
'count': 1,
'type': 'int',
'value': 0
},
# position and sigma
'POSITION': {
'scan': 1.0,
'count': 1,
'type': 'float',
'value': 0
},
'SIGMA': {
'scan': 1.0,
'count': 1,
'type': 'float',
'value': 0
},
}
pv_db = {
# image
'PULSE': {
'scan': 1.0,
'count': pv_array_size,
'type': 'float'
},
'PULSESZ': {
'scan': 0.0,
'count': 1,
'type': 'int',
'value': pv_array_size
},
}
Your database looks all fine to me. I thought it could be from some ill character in "units" field.
Looking closely at the error message, it rises from the UDP message handling. Could be that some CA client is searching for a channel name with a 0x92 char? Can you run the example/casnooper.py and check the log?
Will try out with the example/casnooper.py.
Meanwhile, I got this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 65: invalid start byte
[New Thread 0x7fff845a5700 (LWP 21357)]
filename="../../../../../../src/ca/legacy/pcas/generic/st/casDGIntfOS.cc" line number=464
Internal failure unexpected problem with UDP input from "iscr-lebt-cagw-01-d2.tn.esss.lu.se:33514"
Exception in thread CAServerThread:
RuntimeError: SWIG director method error. Exception Calling Python Code
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/hinkokocevar/webpvsoftioc/webpvioc/server.py", line 79, in target
self.server.process(0.1)
File "/usr/lib64/python3.5/site-packages/pcaspy/driver.py", line 825, in process
cas.process(delay)
SystemError: <built-in function process> returned a result with an error set
I tried to simulate a client searching for PV with illigal chars in the name. The program does not result in an exception like this. Also in your case the message is at least 65 in length, which is bit too long for a PV name.
Can you use server.setDebugLevel(10)
, so that it prints out detailed input requests.
OK, looking at my collected log from casnooper.py, running over night, I can immediately spot the PV name that is too long:
hinkokocevar@TPhinxx ~/webpv/webpvioc $ echo CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo- | wc
1 1 65
Another funny thing is that there are a lot of lines in the output that have '
2018-12-05 22:40:16,300: isrc-lebt-ipc.tn.esss.lu.se:46469 <none>
2018-12-05 22:40:16,300: isrc-lebt-ipc.tn.esss.lu.se:46469 <none>
2018-12-05 22:40:16,300: isrc-lebt-ipc.tn.esss.lu.se:46469 <none>
2018-12-05 22:40:16,301: isrc-lebt-ipc.tn.esss.lu.se:46469 <none>
Will enable the debug output as suggested and report back.
How is the log from casnooper.py look like?
The empty PV name may come form some java clients, as explained here https://epics.anl.gov/tech-talk/2017/msg00723.php
Here is the culprit. Your pointer about the too long PV name seems to be the right direction:
"gpn-cagw-02-ca.tn.esss.lu.se:51485" is searching for "LEBT-010:PBI-NPM-001:HCAM-SyncOut1Level"
CAS Request: ? on gpn-cagw-02-ca.tn.esss.lu.se:51485: cmd=6 cid=17777 typ=5 cnt=13 psz=48 avail=4571
"gpn-cagw-02-ca.tn.esss.lu.se:51485" is searching for "LEBT-010:PBI-NPM-001:HCAM-SyncOut1Level_RBV"
CAS Request: ? on gpn-cagw-02-ca.tn.esss.lu.se:51485: cmd=6 cid=17778 typ=5 cnt=13 psz=40 avail=4572
"gpn-cagw-02-ca.tn.esss.lu.se:51485" is searching for "LEBT-010:PBI-NPM-001:HCAM-SyncOut1Mode"
CAS Request: ? on gpn-cagw-02-ca.tn.esss.lu.se:51485: cmd=6 cid=17779 typ=5 cnt=13 psz=48 avail=4573
"gpn-cagw-02-ca.tn.esss.lu.se:51485" is searching for "LEBT-010:PBI-NPM-001:HCAM-SyncOut1Mode_RBV"
CAS Incoming: 128 byte msg from gpn-cagw-02-ca.tn.esss.lu.se:51485
CAS Response: cmd=0 id=0 typ=0 cnt=13 psz=0 avail=0 outBuf ptr=0x2783b98
CAS Request: ? on iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754: cmd=0 cid=34536 typ=1 cnt=13 psz=0 avail=0
CAS Request: ? on iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754: cmd=6 cid=107703 typ=5 cnt=13 psz=72 avail=1a4b7
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-�"
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 64: unexpected end of data
CAS Response: cmd=11 id=ffffffff typ=0 cnt=0 psz=64 avail=8e outBuf ptr=0x2783ba8
filename="../../../../../../src/ca/legacy/pcas/generic/st/casDGIntfOS.cc" line number=464
Internal failure unexpected problem with UDP input from "iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754"
ERR NPM:174 target - PV POSITION data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV POSITION data error: 'NoneType' object is not subscriptable
Exception in thread CAServerThread:
RuntimeError: SWIG director method error. Exception Calling Python Code
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/hinkokocevar/webpvsoftioc/webpvioc/server.py", line 80, in target
self.server.process(0.1)
File "/usr/lib64/python3.5/site-packages/pcaspy/driver.py", line 825, in process
cas.process(delay)
SystemError: <built-in function process> returned a result with an error set
ERR NPM:174 target - PV XSIZE data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSESZ data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV XSIZE data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV COUNTER data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV COUNTER data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSESZ data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV PROFILESZ data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV PROFILESZ data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV SIGMA data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV SIGMA data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV YSIZE data error: 'NoneType' object is not subscriptable
ERR NPM:174 target - PV YSIZE data error: 'NoneType' object is not subscriptable
ERR BCM:141 target - PV PULSE data error: 'NoneType' object is not subscriptable
Actually it is the same PV from the message earlier, whose name seems to be too long.
In the casnopper.py log I see these lines that explain the numerous different UDP packet issues as posted before (last char of the PV name that is too long) can vary depending on what it is in the memory at that location:
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-P"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-"
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-�"
This PV name seems to be concatenated from several names CrS-TICP:Cryo
CrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsg
Nbr CrS-TICP:Cryo-
. You could now check the gateway log to see which client has requested this PV.
Ah, you are right.. I have not noticed that (not my IOC..). Thanks for the extensive help @xiaoqiangwang, I'll have our integrators and network people look into it.
Could pcaspy be more cautious and check for the PV name length before or similar in order not to suffer from erroneous PV requests flying by?
I created a branch "non-utf-pv" to use latin1 codec if pv names are not utf8 encoded. It still prints the error message but should not result in exceptions. Can you give it a try?
Casnooper.py from the non-utf-pv branch is running..
No errors / exception for the whole weekend. I still see the long PV name:
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-}"
2018-12-10 11:01:59,211: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 CrS-TICP:CryoCrS-TICP:Cryo-Virt-MSG1:Cmd_AckMsgNbrCrS-TICP:Cryo-}
CAS Request: ? on iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754: cmd=6 cid=1777824 typ=5 cnt=13 psz=32 avail=1b20a0
And a test with the locally generated PV with too long name, also passes without errors/exceptions:
CA CLIENT:
(epics) -bash-4.2$ camonitor LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the-epics-network
SNOOPER:
2018-12-10 11:04:19,868: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the"
2018-12-10 11:04:19,918: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the"
2018-12-10 11:04:19,999: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the"
2018-12-10 11:04:20,147: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the"
2018-12-10 11:04:20,552: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
"iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754" is searching for "LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the"
2018-12-10 11:04:28,793: iscr-lebt-cagw-01-d2.tn.esss.lu.se:45754 LRR:this-is-a-pv-with-the-long-name-that-is-quite-illegal-on-the
A PV name no matter how long, if utf8 encoded, is not a problem. There is no limit imposed on the length of a PV name. I tried with PV name with 109 characters.
It is a problem with this specific client. It seems to limit PV name to 65 characters, but somehow the last character is filled with garbage.
In any case it is a good idea to guard against PV searches with illegal characters. I have merged the "non-utf-pv" branch to master.
Fixed in release 0.7.2.
I'm getting quite a lot of messages like these on the local network. The PVs are behind a PV gateway.
I'm not sure how critical the messages are, any idea? Thanks!