pvget not returning anything when setting EPICS_PVA_ADDR_LIST and EPICS_PVA_AUTO_ADDR_LIST to NO

jbellister-slac commented 1 year ago

Describe the bug

Hi! Not sure if it's a bug, I'm not configuring something properly, or maybe just not implemented yet. But for some context, I'm attempting to update the version of P4P used at SLAC from 3.5.5 to a version 4.X (trying to go straight to 4.1.5 at the moment). When testing the update, we've noticed that existing P4P servers that are serving PVs from a machine just fine with 3.5.5 are no longer able to be communicated with from a different machine, and actually even from the same machine itself. It seems to be an issue with the underlying change to use pvxs, and I've narrowed it down to the simple reproducible case below:

To Reproduce

Steps to reproduce the behavior:

Build the latest pvxs from source using the directions here: https://mdavidsaver.github.io/pvxs/building.html
Set EPICS_PVA_ADDR_LIST to the IP address of the machine it is running on, and EPICS_PVA_AUTO_ADDR_LIST to NO
Run simple server from examples: ./example/O.linux-x86_64/simplesrv
From the same machine, run pvget my:pv:name
Timeout

Expected behavior pvget should return the PV. (It does return correctly with the same EPICSPVA* configuration using P4P 3.5.5 or just the example from pvAccess module from epics-base)

Information (please complete the following):

Output of pvxinfo -D. (This is from RHEL7)


Host: linux-x86_64
Target: linux-x86_64 Linux gcc
Toolchain
    __cplusplus = 201103
    GCC 4.8.5
    GLIBC 2.17
    __GLIBCXX__ 20150623
Versions
    PVXS 1.1.0
    EPICS 7.0.7.1-DEV
    libevent 2.0.21-stable
Runtime
    uname() -> Linux poseidon 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 x86_64
    epicsThreadGetCPUs() -> 4
    osiLocalAddr() -> 45.33.109.184
    osiSockDiscoverBroadcastAddresses() ->
        45.33.109.255
        172.17.255.255
        192.168.49.255
Effective Client config from environment
    EPICS_PVA_ADDR_LIST="45.33.109.184:5076"
    EPICS_PVA_AUTO_ADDR_LIST=NO
    EPICS_PVA_BROADCAST_PORT=5076
    EPICS_PVA_SERVER_PORT=5075
    EPICS_PVA_CONN_TMO=30
Effective Server config from environment
    EPICS_PVAS_INTF_ADDR_LIST="[::]"
    EPICS_PVAS_BEACON_ADDR_LIST="45.33.109.184:5076"
    EPICS_PVAS_IGNORE_ADDR_LIST=""
    EPICS_PVAS_AUTO_BEACON_ADDR_LIST=NO
    EPICS_PVAS_SERVER_PORT=5075
    EPICS_PVAS_BROADCAST_PORT=5076
    EPICS_PVA_CONN_TMO=30

Additional context

Running the server with PVXS_LOG=*=DEBUG shows debug output on server startup, but no debug output when a pvget is made. Can include other information and dig further if needed, but wanted to see if this was a known thing first.

Server startup debug output:


2022-12-14T09:21:09.361085154 DEBUG pvxs.iface refresh after 11300007.7 sec
2022-12-14T09:21:09.361684399 DEBUG pvxs.iface Ignoring interface 'lo' address family=17
2022-12-14T09:21:09.361763940 DEBUG pvxs.iface Ignoring interface 'eth0' address family=17
2022-12-14T09:21:09.361799600 DEBUG pvxs.iface Ignoring interface 'docker0' address family=17
2022-12-14T09:21:09.361826160 DEBUG pvxs.iface Ignoring interface 'br-67a3e997d373' address family=17
2022-12-14T09:21:09.361843200 DEBUG pvxs.iface Ignoring interface 'veth9ba2dc7' address family=17
2022-12-14T09:21:09.362425144 DEBUG pvxs.iface Found interface 1 "lo" w/ 2 127.0.0.1
2022-12-14T09:21:09.362503605 DEBUG pvxs.iface Found interface 2 "eth0" w/ 2 45.33.109.184
2022-12-14T09:21:09.362537445 DEBUG pvxs.iface Found interface 101480 "docker0" w/ 2 172.17.0.1
2022-12-14T09:21:09.362570106 DEBUG pvxs.iface Found interface 223161 "br-67a3e997d373" w/ 2 192.168.49.1
2022-12-14T09:21:09.362602936 DEBUG pvxs.iface Found interface 1 "lo" w/ 10 [::1]
2022-12-14T09:21:09.362644146 DEBUG pvxs.iface Found interface 2 "eth0" w/ 10 [2600:3c01::f03c:93ff:fe35:1c56]
2022-12-14T09:21:09.362679036 DEBUG pvxs.iface Found interface 2 "eth0" w/ 10 [fe80::f03c:93ff:fe35:1c56]%2
2022-12-14T09:21:09.362759117 DEBUG pvxs.iface Found interface 101480 "docker0" w/ 10 [fe80::42:3ff:fe99:c51d]%101480
2022-12-14T09:21:09.362797537 DEBUG pvxs.iface Found interface 223161 "br-67a3e997d373" w/ 10 [fe80::42:6dff:fe81:6b2a]%223161
2022-12-14T09:21:09.362831448 DEBUG pvxs.iface Found interface 223167 "veth9ba2dc7" w/ 10 [fe80::78f0:e3ff:fe41:b586]%223167
2022-12-14T09:21:09.364017096 INFO pvxs.loop Enter loop worker for 0x7f88000008f0 using epoll
2022-12-14T09:21:09.364110227 DEBUG pvxs.server.setup Promote 0.0.0.0 -> [::]
2022-12-14T09:21:09.364275378 INFO pvxs.loop Enter loop worker for 0x7f88040008f0 using epoll
2022-12-14T09:21:09.364506020 INFO pvxs.udp.setup Bound to UDP [::]:5076 as lo
2022-12-14T09:21:09.364545040 DEBUG pvxs.udp.setup Listening for SEARCH on [::]:5076
2022-12-14T09:21:09.364850613 DEBUG pvxs.server.setup Will send beacons to 45.33.109.184:5076
2022-12-14T09:21:09.365665459 DEBUG pvxs.server.setup Server Starting
2022-12-14T09:21:09.365721709 DEBUG pvxs.server.setup Server starting
2022-12-14T09:21:09.365757089 DEBUG pvxs.server.setup Server enabled listener on [::]:5075
2022-12-14T09:21:09.365808590 DEBUG pvxs.udp.setup Start listening for UDP [::]:5076
2022-12-14T09:21:09.365860740 DEBUG pvxs.server.setup Server beacon timer expires
2022-12-14T09:21:09.365923521 DEBUG pvxs.server.io Beacon tx to 45.33.109.184:5076
2022-12-14T09:21:09.365947781 DEBUG pvxs.udp.io UDP 0x7f8804001570 event 2
0000 : CA02C000 00000027 A4A02B72 BF7E74A2
0010 : 84B3596B 00000001 00000000 00000000
0020 : 0000FFFF 00000000 13D30374 6370FF
2022-12-14T09:21:09.366042661 DEBUG pvxs.udp.io UDP Rx 47, [::ffff:45.33.109.184]:47377 -> [::ffff:45.33.109.184]:5076 @1 ([::]:5076)

mdavidsaver commented 1 year ago

In short. I am able to replicate what you report. This is a bug which I will look into.

I would still recommend designing your PVA networks to avoid relying on automatic forwarding of unicast searches.

tldr...

PVXS handles the 224.0.0.128 local multicast "hack" differently than pvAccessCPP or pvAccessJava. I never liked the design of this feature, and long thought that it would eventually blow up on someone.

Prior to PVXS 0.3.0, unicast searches were never rebroadcast. Since 0.3.0, unicast searches are rebroadcast (with a CMD_ORIGIN_TAG prefix), but not in all situations. Specifically, I know that some versions of some implementations don't prefix forwarded messages with CMD_ORIGIN_TAG. I also don't trust that all implementations clear the Unicast flag bit on forwarded messages.

So PVXS tries to be strict about only forwarding unicast searches w/o CMD_ORIGIN_TAG which arrive from an interface other than 127.0.0.1. There is the added wrinkle that forwarding has proven difficult to unittest, and I have made mistakes in the past. Thus far these have been cases of being too strict.

Part of the problem now has to do with the addition of IPv6 support and the Linux specific differences between [::] vs. 0.0.0.0. The result is that the server is not joining 224.0.0.128.

To compound this, so far I haven't figured out a good way to unittest any of the behavior.

jbellister-slac commented 1 year ago

I had a good guess what that link would be before clicking it :)

Got it, that does sounds like a pain, but thanks for taking a look!

jbellister-slac commented 1 year ago

I would still recommend designing your PVA networks to avoid relying on automatic forwarding of unicast searches.

While this fix is in progress, is there a better practice way of solving this on our end? If we say have multiple PVA servers running on a single linux host, and a client on the same subnet wants to be able to retrieve PVs from any of them what is the recommended approach for setting that up with PVA?

mdavidsaver commented 1 year ago

... is there a better practice way of solving this on our end?

Without specific knowledge of how your network is laid out (which probably shouldn't be posted here) I can't say more than to avoid situations requiring unicast UDP search. At simplest, this means relying on broadcast search. Other situations might involve adding PVA gateways between subnets, and/or utilizing ipv4 multicast when the desired scope for PVA searching crosses subnet boundaries.

mdavidsaver / pvxs

pvget not returning anything when setting EPICS_PVA_ADDR_LIST and EPICS_PVA_AUTO_ADDR_LIST to NO #31