tango-controls / JTango

TANGO kernel Java implementation. JTango moved to https://gitlab.com/tango-controls/JTango
http://tango-controls.org
8 stars 14 forks source link

Heartbeat errors when using network alias for TANGO_HOST #90

Closed reszelaz closed 4 years ago

reszelaz commented 5 years ago

Hi JTango experts,

First of all sorry if its a duplicate of #89 but with my knowledge I was not able to assess it..

At ALBA we have observed that events does not work when we use network alias for our TANGO_HOST. Initially we observed it with Astor, but I was able to reproduce it with a simple client (I'm not familiar at all with Java so it perfectly may be an example of how NOT to program in Java:P).

We use JTango-9.1.2.jar.

Note that this (at least using Astor) was not happening with Tango 8.

Thanks in advance for your help!

Pascal-Verdier commented 5 years ago

Hi JTango-9.1.2.jar is a very old release. I think it has been fixed. The latest release is 9.5.17 https://bintray.com/tango-controls/jtango/JTango/9.5.17

reszelaz commented 5 years ago

Thanks @Pascal-Verdier. Indeed this bug is already solved! We have just tried it on the TangoBox, which uses JTango 9.5.13. We will proceed with the upgrade then. Sorry about bothering you with this.

reszelaz commented 5 years ago

BTW do you know if someone had packaged this for Debian?:)

Pascal-Verdier commented 5 years ago

Hi I don't know. Sorry

reszelaz commented 5 years ago

I made a mistake when testing this on the TangoBox, which uses JTango 9.5.13. The problem is still there. Now I've also tested it with JTango 9.5.17 and again there is a problem. So, I'm reopening this issue.

reszelaz commented 4 years ago

Hi again! First of all many thanks for fixing this bug! How can we try it? Is the JTango jar available somewhere? Thanks a lot!

reszelaz commented 4 years ago

Thanks to the help of @Pascal-Verdier we got the jar: http://ftp.esrf.fr/pub/cs/tango/jtango/JTango-9.5.20-SNAPSHOT.jar. However there is still a problem with the events (see simple-client example from the issue description). Now the error message is different:

sicilia@controls05:~/workspace/test-tango-java/src$ CLASSPATH=.:$CLASSPATH java SimpleClient sys/tg_test/1
16:15:53,129 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
16:15:53,129 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
16:15:53,129 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.xml]
16:15:53,148 |-INFO in org.tango.logging.DefaultTangoLoggingConfiguration@629f0666 - Setting up default configuration from 
16:15:53,228 |-INFO in ch.qos.logback.classic.joran.action.JMXConfiguratorAction - begin

========== ZMQ (3.22) event system is available ============
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:328)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at SimpleClient.subscribeChangeStateEvent(SimpleClient.java:39)
        at SimpleClient.main(SimpleClient.java:116)
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
^C======== Shutting down ZMQ event system ==========
bourtemb commented 4 years ago

@reszelaz , there is still an issue with the hearbeat channel names indeed. @gwen-soleil is aware of that problem because we exchanged e-mails about that issue. I reopen this issue because the problem you described in your previous comment is still present in the jtango-9-lts branch

reszelaz commented 4 years ago

Many thanks for the update! Do you have some rough estimation on when we could count on a solution. This is not to put pressure, just to consider it in our project plan:) Thanks again!

Pascal-Verdier commented 4 years ago

Hi About the NullPointerException, could make a test. With Jive, on Starter admin device (dserver/starter/hostname) start a TestDevice. Send the ZmqEventSubscriptionChange comand with parameters: "Starter device name",State,subscribe,change,5 And send me the result of this command.

reszelaz commented 4 years ago

Thanks for looking into this!

I attach the result of the test that you ask me:

Command: dserver/Starter/controls05/ZmqEventSubscriptionChange
Duration: 315 msec
Argin: tango/admin/controls05, State, subscribe, change, 5
Output argument(s) :
array length: 6
lvalue:
[0]  934
[1]  5
[2]  1000
[3]  81920
[4]  20000
[5]  421
array length: 4
svalue:
[0]  tcp://84.89.246.67:35211
[1]  tcp://84.89.246.67:45375
[2]  tango://ct05.cells.es:10000/tango/admin/controls05/state.idl5_change
[3]  tango://ct05.cells.es:10000/dserver/starter/controls05

Just let me know if you need any other test.

Pascal-Verdier commented 4 years ago

Everything is OK. I cannot understand why it works fine on my tests and fails on yours. Sorry

reszelaz commented 4 years ago

I paste here some information that I passed to you via email few weeks ago, maybe this will give a hint to someone else:

Also, I think I have not mentioned it before, but there are no such problems when using C++ or Python. Also, when we use astor with Tango7 and Tango8 and rest of the infrastrucutre with Tango 9 (Tango DB, Starter, etc.) there are no such problems.

reszelaz commented 4 years ago

We have some clue. It looks like it is related with the reverse DNS lookup. Our reverse DNS lookup configuration returns the host name (controls05) and not the alias (ct05):

sicilia@controls05:~$ host 84.89.246.67
67.246.89.84.in-addr.arpa domain name pointer controls05.cells.es.

When we add the following line to the /etc/hosts:

84.89.246.67   ct05.cells.es ct05

there are no heartbeat problems.

However then there are problems with Starter:

sicilia@controls05:~$ /usr/lib/tango/Starter controls05
Starter::Starter() init device tango/admin/controls05                                                                                                                                                             
UseEvents  = False                                                                                                                                                                                                
interStartupLevelWait  = 1                                                                                                                                                                                        
serverStartupTimeout   = 1                                                                                                                                                                                        
fireFromDbase  = 1                                                                                                                                                                                                
logFileHome    = /var/tmp                                                                                                                                                                                         
StartServersAtStartup = 1                                                                                                                                                                                         
AutoRestartDuration   = 0                                                                                                                                                                                         
Tango exception                                                                                                                                                                                                   
Severity = ERROR                                                                                                                                                                                                  
Error reason = BAD_PARAM                                                                                                                                                                                          
Desc : This server must run on controls05 and not on ct05                                                                                                                                                         
Origin : Starter::check_host()                                                                                                                                                                                    

Received a CORBA_Exception                                                                                                                                                                                        
Exiting

How do you have it configured at your site?

Pascal-Verdier commented 4 years ago

Hi

On host ebssimul (alias ebs-simu) the file /etc/hosts contains:

160.103.10.80 ebssimul.esrf.fr ebssimul

The Starter server is defined as Starter/ebssimul

Sorry for the delay, but we are now working from home for an unknown duration... Pascal

On 12/03/2020 12:05, reszelaz wrote:

We have some clue. It looks like it is related with the reverse DNS lookup. Our reverse DNS lookup configuration returns the host name (controls05) and not the alias (ct05):

sicilia@controls05:~$ host 84.89.246.67 67.246.89.84.in-addr.arpa domain name pointer controls05.cells.es.

When we add the following line to the /etc/hosts:

84.89.246.67 ct05.cells.es ct05

there are no heartbeat problems.

However then there are problems with Starter:

sicilia@controls05:~$ /usr/lib/tango/Starter controls05 Starter::Starter() init device tango/admin/controls05

UseEvents = False

interStartupLevelWait = 1

serverStartupTimeout = 1

fireFromDbase = 1

logFileHome = /var/tmp

StartServersAtStartup = 1

AutoRestartDuration = 0

Tango exception

Severity = ERROR

Error reason = BAD_PARAM

Desc : This server must run on controls05 and not on ct05

Origin : Starter::check_host()

Received a CORBA_Exception

Exiting

How do you have it configured at your site?

-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

Links:

[1] https://github.com/tango-controls/JTango/issues/90#issuecomment-598128404 [2] https://github.com/notifications/unsubscribe-auth/AEWBICDXAMLUMVG4VQCFEQDRHC6WPANCNFSM4JKKGJSA

reszelaz commented 4 years ago

Hi Pascal, Thanks for checking your /etc/hosts configuration. So, you don't use DNS server for the reverse lookup but the /etc/hosts. I have added a similar line on our side: 84.89.246.67 controls05.cells.es controls05 but the result is the same - still no hearbeats:

subscribeChangeServerEvent() for tango/admin/controls05/Servers OK!
controls05
Tango exception
Severity -> ERROR
Desc -> No heartbeat from dserver/starter/controls05
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
Mon Mar 16 18:48:27 CET 2020
tango/admin/controls05  has received a DevFailed :      No heartbeat from dserver/starter/controls05
HostStateThread.StateEventListener on tango/admin/controls05

Gwenaëlle, I see that you work on this issue. Many thanks! If you need that we test some new jar, just let me know!

Pascal-Verdier commented 4 years ago

Yes we use the DNS server, but I am not a network specialist.

You have same result. That means the problem is elsewhere...

On 16/03/2020 18:59, reszelaz wrote:

Hi Pascal, Thanks for checking your /etc/hosts configuration. So, you don't use DNS server for the reverse lookup but the /etc/hosts. I have added a similar line on our side: 84.89.246.67 controls05.cells.es controls05 but the result is the same - still no hearbeats:

subscribeChangeServerEvent() for tango/admin/controls05/Servers OK! controls05 Tango exception Severity -> ERROR Desc -> No heartbeat from dserver/starter/controls05 Reason -> API_NoHeartbeat Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped() Mon Mar 16 18:48:27 CET 2020 tango/admin/controls05 has received a DevFailed : No heartbeat from dserver/starter/controls05 HostStateThread.StateEventListener on tango/admin/controls05

Gwenaëlle, I see that you work on this issue. Many thanks! If you need that we test some new jar, just let me know!

-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

Links:

[1] https://github.com/tango-controls/JTango/issues/90#issuecomment-599680442 [2] https://github.com/notifications/unsubscribe-auth/AEWBICCTVRIB7X2HDTVMFCTRHZSJRANCNFSM4JKKGJSA

reszelaz commented 4 years ago

Hi Pascal, I talked with our System Administrator and to better understand why we get this problem and you do not he would like to know the result of the following commands:

host ebssimul
host ebs-simu
host 160.103.10.80

Could you post them here? Many thanks!

Pascal-Verdier commented 4 years ago

Hi here is the result:

reszelaz commented 4 years ago

I have just tried with the latest JTango 9.6 and looks like the problem is still there. Here is how I tested it:

sicilia@controls05:/usr/share/java$ sudo wget https://github.com/tango-controls/JTango/releases/download/9.6.0/JTango-9.6.0.jar
sicilia@controls05:/usr/share/java$ sudo ln -s JTango-9.6.0.jar JTango.jar

sicilia@controls05:/usr/share/java$ ls -lrt JTango*    
-rw-r--r-- 1 root root 23883055 Nov 15  2019 JTango-9.5.17.jar
-rw-r--r-- 1 root root 23900753 Feb 21 10:03 JTango-9.5.20-SNAPSHOT.jar
-rw-r--r-- 1 root root 25250437 May 28 11:57 JTango-9.6.0.jar
lrwxrwxrwx 1 root root       16 Jun 10 10:41 JTango.jar -> JTango-9.6.0.jar

sicilia@controls05:~/workspace/test-tango-java$ source devenv

sicilia@controls05:~/workspace/test-tango-java$ cd src/

sicilia@controls05:~/workspace/test-tango-java/src$ javac SimpleClient.java
Note: CAL10NAnnotationProcessor 0.8.1 initialized
warning: Supported source version 'RELEASE_5' from annotation processor 'ch.qos.cal10n.verifier.processor.CAL10NAnnotationProcessor' less than -source '1.8'
1 warning

sicilia@controls05:~/workspace/test-tango-java/src$ CLASSPATH=.:$CLASSPATH java SimpleClient sys/tg_test/1
10:50:38,334 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
10:50:38,335 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
10:50:38,335 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.xml]
10:50:38,342 |-INFO in org.tango.logging.DefaultTangoLoggingConfiguration@629f0666 - Setting up default configuration from 
10:50:38,404 |-INFO in ch.qos.logback.classic.joran.action.JMXConfiguratorAction - begin

========== ZMQ (4.17) event system is available ============
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:345)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at SimpleClient.subscribeChangeStateEvent(SimpleClient.java:39)
        at SimpleClient.main(SimpleClient.java:116)
tango://ct05.cells.es:10000/sys/tg_test/1/state.idl5_change ?  NOT FOUND
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
^C======== Shutting down ZMQ event system ==========

The same with Astor, but here I have not downloaded the latest Astor, so I don't know if Astor 7.2.7 should work with it?:

sicilia@controls05:~/workspace/test-tango-java/src$ astor
Display is localhost:16.0
Index: 5, Size: 5
========== ZMQ (4.17) event system is available ============
Build  GUI :1449 ms
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:345)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at admin.astor.HostStateThread.subscribeChangeStateEvent(HostStateThread.java:286)
        at admin.astor.AstorTree$subscribeThread.run(AstorTree.java:1251)
Total time to subscribe on 4 hosts : 2082 ms
Total time to start Astor 3369 ms
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found

How can we help in fixing this issue?

gwen-soleil commented 4 years ago

hello @reszelaz,

I think we found the bug. Could you perform another test with this version https://drive2.demo.renater.fr/index.php/s/8FdAnA48GLNjcpb please?

reszelaz commented 4 years ago

Hi @gwen-soleil,

First of all sorry for the late reply. I was on a leave for almost 3 months. Now I'm back at work and just found your message. I downloaded the JTango-9.6.1-SNAPSHOT.jar and made the JTango.jar link (/usr/share/java) point to it: JTango.jar -> JTango-9.6.1-SNAPSHOT.jar. Then I start astor 7.2.7 with the TANGO_HOST pointing to the DNS alias and I still get the Heartbeat errors:

zreszela@pc255:/usr/share/java> astor
Display is :0
Index: 5, Size: 5
Build  GUI :6715 ms
========== ZMQ (4.17) event system is available ============
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:345)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at admin.astor.HostStateThread.subscribeChangeStateEvent(HostStateThread.java:286)
        at admin.astor.AstorTree$subscribeThread.run(AstorTree.java:1251)
tango://ct05.cells.es:10000/tango/admin/controls05/state.idl5_change ?  NOT FOUND
tango://ct05.cells.es:10000/tango/admin/controls05/state.idl5_change ?  NOT FOUND
Total time to subscribe on 5 hosts : 2729 ms
Total time to start Astor 9137 ms
Tue Aug 11 10:50:27 CEST 2020
tango/admin/pc255  has received a DevFailed :   No heartbeat from dserver/starter/pc255
HostStateThread.StateEventListener on tango/admin/pc255
Tue Aug 11 10:50:37 CEST 2020
tango/admin/pc255  has received a DevFailed :   No heartbeat from dserver/starter/pc255
HostStateThread.StateEventListener on tango/admin/pc255
Tue Aug 11 10:50:47 CEST 2020
tango/admin/pc255  has received a DevFailed :   No heartbeat from dserver/starter/pc255
HostStateThread.StateEventListener on tango/admin/pc255
Astor exiting....

Note that when I don't use the DNS alias in the TANGO_HOST with the same version of JTango I don't get the Heartbeat errors:

zreszela@pc255:/usr/share/java> astor
Display is :0
Index: 5, Size: 5
Build  GUI :1046 ms
========== ZMQ (4.17) event system is available ============
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:345)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at admin.astor.HostStateThread.subscribeChangeStateEvent(HostStateThread.java:286)
        at admin.astor.AstorTree$subscribeThread.run(AstorTree.java:1251)
Total time to subscribe on 5 hosts : 1957 ms
Total time to start Astor 2932 ms
Astor exiting....

Also with the SimpleClient example from the issue description I observe the same results. First I try it with the TANGO_HOST pointing to the DNS alias and I get:

sicilia@controls05:/usr/share/java$ cd ~/workspace/test-tango-java/
sicilia@controls05:~/workspace/test-tango-java$ source devenv 
sicilia@controls05:~/workspace/test-tango-java$ cd src/
sicilia@controls05:~/workspace/test-tango-java/src$ ls
SimpleClient.class  SimpleClient.java  SimpleClient$StateEventListener.class
sicilia@controls05:~/workspace/test-tango-java/src$ rm *.class
sicilia@controls05:~/workspace/test-tango-java/src$ javac SimpleClient.java
Note: CAL10NAnnotationProcessor 0.8.1 initialized
warning: Supported source version 'RELEASE_5' from annotation processor 'ch.qos.cal10n.verifier.processor.CAL10NAnnotationProcessor' less than -source '1.8'
1 warning
sicilia@controls05:~/workspace/test-tango-java/src$ CLASSPATH=.:$CLASSPATH java SimpleClient sys/tg_test/1
11:14:47,775 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
11:14:47,775 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
11:14:47,775 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/usr/share/java/JTango-9.6.1-SNAPSHOT.jar!/logback.xml]
11:14:47,826 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@2d554825 - URL [jar:file:/usr/share/java/JTango-9.6.1-SNAPSHOT.jar!/logback.xml] is not of type file
11:14:48,241 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
11:14:48,241 |-INFO in ch.qos.logback.classic.joran.action.JMXConfiguratorAction - begin
11:14:48,842 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
11:14:48,850 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [CONSOLE]
11:14:49,061 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - This appender no longer admits a layout as a sub-component, set an encoder instead.
11:14:49,061 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
11:14:49,061 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.jacorb] to OFF
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango] to INFO
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.orb] to INFO
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.cache.StateStatusCache] to INFO
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.utils] to OFF
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.events] to DEBUG
11:14:49,063 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.admin] to DEBUG
11:14:49,065 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.client.database.cache] to DEBUG
11:14:49,065 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.quartz] to OFF
11:14:49,065 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sf.ehcache] to OFF
11:14:49,065 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ch.qos.logback] to OFF
11:14:49,066 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [TangoClientRequests] to OFF
11:14:49,066 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [TangoClientRequests] to true
11:14:49,066 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
11:14:49,066 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [CONSOLE] to Logger[ROOT]
11:14:49,066 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
11:14:49,068 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@68837a77 - Registering current configuration as safe fallback point

========== ZMQ (4.17) event system is available ============
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:345)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at SimpleClient.subscribeChangeStateEvent(SimpleClient.java:39)
        at SimpleClient.main(SimpleClient.java:116)
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/sys/tg_test/1/state.idl5_change ?  NOT FOUND
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
^C======== Shutting down ZMQ event system ==========

And when I don't use the DNS alias in the TANGO_HOST it seems to work correctly:

sicilia@controls05:~/workspace/test-tango-java/src$ CLASSPATH=.:$CLASSPATH java SimpleClient sys/tg_test/1
11:19:58,918 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
11:19:58,918 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
11:19:58,918 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/usr/share/java/JTango-9.6.1-SNAPSHOT.jar!/logback.xml]
11:19:58,937 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@2d554825 - URL [jar:file:/usr/share/java/JTango-9.6.1-SNAPSHOT.jar!/logback.xml] is not of type file
11:19:59,015 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
11:19:59,016 |-INFO in ch.qos.logback.classic.joran.action.JMXConfiguratorAction - begin
11:19:59,100 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
11:19:59,108 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [CONSOLE]
11:19:59,165 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - This appender no longer admits a layout as a sub-component, set an encoder instead.
11:19:59,165 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
11:19:59,165 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.jacorb] to OFF
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango] to INFO
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.orb] to INFO
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.cache.StateStatusCache] to INFO
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.utils] to OFF
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.events] to DEBUG
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.admin] to DEBUG
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.client.database.cache] to DEBUG
11:19:59,166 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.quartz] to OFF
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sf.ehcache] to OFF
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ch.qos.logback] to OFF
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [TangoClientRequests] to OFF
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [TangoClientRequests] to true
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
11:19:59,167 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [CONSOLE] to Logger[ROOT]
11:19:59,167 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
11:19:59,169 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@68837a77 - Registering current configuration as safe fallback point

========== ZMQ (4.17) event system is available ============
Received Event
Received Event

Note, to work correctly I need to restart the TangoTest DS so it realizes about the new TANGO_HOST configuration.

bourtemb commented 4 years ago

@reszelaz , which version of TangoTest are you using? Is it using a recent version of cppTango?

reszelaz commented 4 years ago

I use 9.3.4-rc2+1.ecc439-1~bpo9+1 for libtango9 and for tango-test. Is it enough?

bourtemb commented 4 years ago

I think it should be enough.

reszelaz commented 4 years ago

I have just checked it again with the Tango 9.3.4-rc7 and the result is the same.

gwen-soleil commented 4 years ago

Arg!

Maybe I found another issue in the code of jtango. Here is another version to test https://drive2.demo.renater.fr/index.php/s/PnLHHAc8qjEQCqk Would you mind testing again? as I am not able to reproduce it. I have also added some outputs to the console, that may help me debugging.

Thanks.

reszelaz commented 4 years ago

Thanks for this new version to test. I have repeated both tests. Below the results:

Astor test:

sicilia@controls05:/usr/share/java$ astor
Display is localhost:11.0
Index: 5, Size: 5
Build  GUI :2523 ms
========== ZMQ (4.17) event system is available ============
##########EventConsumer#subscribe_event - in
EventConsumer#subscribe_event, device_channel_map = {tango://ct05.cells.es:10000/tango/admin/controls05=tango://ct05.cells.es:10000/dserver/starter/controls05}
EventConsumer#subscribe_event, channel_map = {tango://84.89.246.67:10000/dserver/starter/controls05=eventChannel         = null
adm_device_proxy     = dserver/starter/controls05
last_subscribed      = 0
last_heartbeat       = 1599668880734
heartbeat_skipped    = false
heartbeat_filter_id  = 0
host         = null
use_db       = true
dbase        = null
zmqEndpoint  = tcp://84.89.246.67:40869
}
EventConsumer#subscribe_event, deviceName = tango://ct05.cells.es:10000/tango/admin/controls05
EventConsumer#subscribe_event, channelName = tango://ct05.cells.es:10000/dserver/starter/controls05
EventConsumer#subscribe_event, event_channel_struct = null
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:480)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at admin.astor.HostStateThread.subscribeChangeStateEvent(HostStateThread.java:286)
        at admin.astor.AstorTree$subscribeThread.run(AstorTree.java:1251)
Total time to subscribe on 5 hosts : 2118 ms
Total time to start Astor 4197 ms
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
tango://ct05.cells.es:10000/dserver/starter/controls05 Not found
Astor exiting....

======== Shutting down ZMQ event system ==========

SimpleClient test:

sicilia@controls05:~/workspace/test-tango-java/src$ rm *.class                                            
sicilia@controls05:~/workspace/test-tango-java/src$ javac SimpleClient.java                               
Note: CAL10NAnnotationProcessor 0.8.1 initialized
warning: Supported source version 'RELEASE_5' from annotation processor 'ch.qos.cal10n.verifier.processor.CAL10NAnnotationProcessor' less than -source '1.8'
1 warning

sicilia@controls05:~/workspace/test-tango-java/src$ CLASSPATH=.:$CLASSPATH java SimpleClient sys/tg_test/1
18:30:31,944 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
18:30:31,944 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
18:30:31,944 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/usr/share/java/JTango-9.6.5-SNAPSHOT.jar!/logback.xml]
18:30:31,981 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@2d554825 - URL [jar:file:/usr/share/java/JTango-9.6.5-SNAPSHOT.jar!/logback.xml] is not of type file
18:30:32,055 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
18:30:32,056 |-INFO in ch.qos.logback.classic.joran.action.JMXConfiguratorAction - begin
18:30:32,158 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
18:30:32,167 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [CONSOLE]
18:30:32,251 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - This appender no longer admits a layout as a sub-component, set an encoder instead.
18:30:32,251 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
18:30:32,251 |-WARN in ch.qos.logback.core.ConsoleAppender[CONSOLE] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
18:30:32,252 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.jacorb] to OFF
18:30:32,252 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango] to INFO
18:30:32,252 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.orb] to INFO
18:30:32,252 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.cache.StateStatusCache] to INFO
18:30:32,252 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.utils] to OFF
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server.events] to DEBUG
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.server] to DEBUG
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.tango.client.database.cache] to DEBUG
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.quartz] to OFF
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sf.ehcache] to OFF
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ch.qos.logback] to OFF
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [TangoClientRequests] to OFF
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [TangoClientRequests] to true
18:30:32,253 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
18:30:32,253 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [CONSOLE] to Logger[ROOT]
18:30:32,254 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
18:30:32,255 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@68837a77 - Registering current configuration as safe fallback point

========== ZMQ (4.17) event system is available ============
##########EventConsumer#subscribe_event - in
EventConsumer#subscribe_event, device_channel_map = {tango://ct05.cells.es:10000/sys/tg_test/1=tango://ct05.cells.es:10000/dserver/tangotest/test}
EventConsumer#subscribe_event, channel_map = {tango://84.89.246.67:10000/dserver/tangotest/test=eventChannel         = null
adm_device_proxy     = dserver/tangotest/test
last_subscribed      = 0
last_heartbeat       = 1599669032922
heartbeat_skipped    = false
heartbeat_filter_id  = 0
host         = null
use_db       = true
dbase        = null
zmqEndpoint  = tcp://84.89.246.67:34945
}
EventConsumer#subscribe_event, deviceName = tango://ct05.cells.es:10000/sys/tg_test/1
EventConsumer#subscribe_event, channelName = tango://ct05.cells.es:10000/dserver/tangotest/test
EventConsumer#subscribe_event, event_channel_struct = null
java.lang.NullPointerException
        at fr.esrf.TangoApi.events.EventConsumer.subscribe_event(EventConsumer.java:480)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:230)
        at fr.esrf.TangoApi.events.EventConsumerUtil.subscribe_event(EventConsumerUtil.java:169)
        at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.subscribe_event(DeviceProxyDAODefaultImpl.java:3479)
        at fr.esrf.TangoApi.DeviceProxy.subscribe_event(DeviceProxy.java:1773)
        at fr.esrf.TangoApi.events.EventDispatcher.subscribe_change_event(EventDispatcher.java:83)
        at fr.esrf.TangoApi.events.TangoChange.addTangoChangeListener(TangoChange.java:75)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:285)
        at fr.esrf.TangoApi.events.TangoEventsAdapter.addTangoChangeListener(TangoEventsAdapter.java:249)
        at SimpleClient.subscribeChangeStateEvent(SimpleClient.java:39)
        at SimpleClient.main(SimpleClient.java:116)
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
tango://ct05.cells.es:10000/dserver/tangotest/test Not found
^C======== Shutting down ZMQ event system ==========
gwen-soleil commented 4 years ago

After running some more tests with @reszelaz , it does not work with TANGO_HOST="ct05.cells.es:10000", which is a DNS alias, but it works with TANGO_HOST=controls05.cells.es:10000.

I found out that it is related to this code https://github.com/tango-controls/JTango/blob/4634b68f96e68de992369de0ec3e2fcfc462ed7d/common/src/main/java/fr/esrf/TangoApi/TangoUrl.java#L235. where InetAddress.getByName("ct05.cells.es").getCanonicalHostName(); return the IP address. It seems to be a known issue as described here: https://stackoverflow.com/questions/1899288/cannot-get-hostname-from-gethostname https://stackoverflow.com/questions/34842698/inetaddress-getcanonicalhostname-returns-ip-instead-of-hostname

And it occurs only when tangoversion > 9.3.0, because of this code https://github.com/tango-controls/JTango/blob/4634b68f96e68de992369de0ec3e2fcfc462ed7d/dao/src/main/java/fr/esrf/TangoApi/events/ZmqEventConsumer.java#L267-L273 that gets the device and admin name from the server. In our case, the names resolved by the class TangoUrl on the client side is different from the name resolved on the server side. And the current code uses a mix of them. So the solution that works at ALBA is to comment the following line: https://github.com/tango-controls/JTango/blob/4634b68f96e68de992369de0ec3e2fcfc462ed7d/dao/src/main/java/fr/esrf/TangoApi/events/ZmqEventConsumer.java#L432 so that only the names from the server side will be used.

@Pascal-Verdier and @bourtemb, Can you review this fix and check that it will not introduce any regression for older versions of Tango?

reszelaz commented 4 years ago

Many thanks @gwen-soleil for working on this issue! And it is a great news to hear that you already found a solution! Note that this issue was happening at ALBA already with libtango 9.2.5 (with Tango 8 it was not happening). If you are interested in seeing the debug prints from JTango when the TangoTest runs with this libtango 9.2.5 version, just let us know.

Pascal-Verdier commented 4 years ago

Hi Congratulation to Gwen to find it. I think the line removed was added for old devices.

To keep this compatibility could we keep this line on condition: if (!cs.channelName.startsWith("ẗango://")) cs.channelName = adminDevice.fullName().toLowerCase(); // Update name with tango host

sergirubio commented 4 years ago

Hi @gwen-soleil , @Pascal-Verdier

I just tested the latest version of the .jar file and it finally works at Alba using DNS Alias

Are we on time to include it for Tango 9.3.4 release? Or will it be released separately?

Sergi

gwen-soleil commented 4 years ago

fixed in release https://github.com/tango-controls/JTango/releases/tag/9.6.6