sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
728 stars 1.4k forks source link

[PINS] Device ID for SONiC Virtual Switch with P4RT #19328

Open pantmal opened 3 months ago

pantmal commented 3 months ago

Description

Hello everyone, I am new to SONiC and I would like some assistance regarding some config related operations.

I have used the sonic-buildimage repo to create a custom SONiC Virtual Switch image. I have used the flag INCLUDE_P4RT = y so I may have a P4 runtime container.

I have deployed this VS using GNS3 and I am able to connect to the switch and the P4RT gRPC port with the P4 runtime shell with: python3 -m p4runtime_sh --grpc-addr <Switch-IP>:9559.

However when I connect I am receiving the following error: CRITICAL:root:P4Runtime RPC error (FAILED_PRECONDITION): Switch does not have a Device ID. Has a config been pushed?

I understand that the P4 runtime shell needs the Device ID to complete the connection, but it has not been set. Can someone assist me with this issue? How am I supposed to set the Device ID and push the config in question? Can this operation be performed by GNS3? I can't see any related option in the sonic-buildimage repo.

Any assistance would be greatly appreciated.

(On another note, I have noticed that the P4rt container does not run by default and needs to be enabled manually with: /usr/bin/p4rt.sh start. Is this behavior expected?)

Steps to reproduce the issue:

  1. Check out to 202311 branch (I chose it for stability purposes). Build image with the following flags:
    SONIC_CONFIG_BUILD_JOBS = 4
    SONIC_DPKG_CACHE_METHOD ?= rwcache
    INCLUDE_P4RT = y
    ENABLE_TRANSLIB_WRITE = y

    Build commands:

    make configure PLATFORM=vs
    make list
    make target/sonic-vs.img.gz
  2. Deploy SONiC image to GNS3. Use the following link: https://support.stordis.com/hc/en-us/articles/12728856093725-How-to-create-a-new-GNS3-appliance-template-to-work-with-SONiC-using-GNS3-Web-UI
  3. Attempt a connection using P4 runtime shell: python3 -m p4runtime_sh --grpc-addr <Switch-Address>:9559

Describe the results you received:

Connecting to the switch with P4 runtime shell gives the error: CRITICAL:root:P4Runtime RPC error (FAILED_PRECONDITION): Switch does not have a Device ID. Has a config been pushed?

Describe the results you expected:

The Device ID needs to be set. The ideal message, considering no forwarding pipeline is set, would be: CRITICAL:root:P4Runtime RPC error (FAILED_PRECONDITION): No valid forwarding pipeline config has been pushed for any node so far.

Output of show version:

SONiC Software Version: SONiC.202311.0-dirty-20240613.091833
SONiC OS Version: 11
Distribution: Debian 11.9
Kernel: 5.10.0-23-2-amd64
Build commit: 156b067c8
Build date: Thu Jun 13 06:22:49 UTC 2024
Built by: ubuntu@sonic-test

Platform: x86_64-kvm_x86_64-r0
HwSKU: Force10-S6000
ASIC: vs
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 14:10:11 up  6:04,  1 user,  load average: 1.23, 1.15, 1.16
Date: Mon 17 Jun 2024 14:10:11

Docker images:
REPOSITORY                    TAG                              IMAGE ID       SIZE
docker-dhcp-relay             latest                           ea038747b3d7   310MB
docker-macsec                 latest                           406e6cb6b71e   330MB
docker-orchagent              202311.0-dirty-20240613.091833   2a4a9c3320d3   339MB
docker-orchagent              latest                           2a4a9c3320d3   339MB
docker-fpm-frr                202311.0-dirty-20240613.091833   dc3878aaf2e0   359MB
docker-fpm-frr                latest                           dc3878aaf2e0   359MB
docker-nat                    202311.0-dirty-20240613.091833   17c806a77350   330MB
docker-nat                    latest                           17c806a77350   330MB
docker-snmp                   202311.0-dirty-20240613.091833   8641ee86c5a3   340MB
docker-snmp                   latest                           8641ee86c5a3   340MB
docker-platform-monitor       202311.0-dirty-20240613.091833   73c8ec963096   421MB
docker-platform-monitor       latest                           73c8ec963096   421MB
docker-teamd                  202311.0-dirty-20240613.091833   cc6421a9c2a6   327MB
docker-teamd                  latest                           cc6421a9c2a6   327MB
docker-eventd                 202311.0-dirty-20240613.091833   b57586d33078   301MB
docker-eventd                 latest                           b57586d33078   301MB
docker-sflow                  202311.0-dirty-20240613.091833   a1d8d6d7721c   329MB
docker-sflow                  latest                           a1d8d6d7721c   329MB
docker-sonic-p4rt             202311.0-dirty-20240613.091833   2d97624cad12   874MB
docker-sonic-p4rt             latest                           2d97624cad12   874MB
docker-router-advertiser      202311.0-dirty-20240613.091833   61705160d20b   301MB
docker-router-advertiser      latest                           61705160d20b   301MB
docker-lldp                   202311.0-dirty-20240613.091833   f2e8d2d60d0e   343MB
docker-lldp                   latest                           f2e8d2d60d0e   343MB
docker-mux                    202311.0-dirty-20240613.091833   7403e879e168   350MB
docker-mux                    latest                           7403e879e168   350MB
docker-sonic-gnmi             202311.0-dirty-20240613.091833   b1b5d0e2d58f   389MB
docker-sonic-gnmi             latest                           b1b5d0e2d58f   389MB
docker-database               202311.0-dirty-20240613.091833   3d2ca6fb0049   301MB
docker-database               latest                           3d2ca6fb0049   301MB
docker-gbsyncd-vs             202311.0-dirty-20240613.091833   93be9d4eec58   313MB
docker-gbsyncd-vs             latest                           93be9d4eec58   313MB
docker-syncd-vs               202311.0-dirty-20240613.091833   6c3849753dd6   317MB
docker-syncd-vs               latest                           6c3849753dd6   317MB
docker-sonic-mgmt-framework   202311.0-dirty-20240613.091833   94fa5f7bf654   417MB
docker-sonic-mgmt-framework   latest                           94fa5f7bf654   417MB

Output of show techsupport:

Attached relevant file.

sonic_dump_sonic_20240617_141035.tar.gz

Also attaching debug file produced by sudo generate_dump

sonic_dump_sonic_20240617_141715.tar.gz

shengminhe commented 3 months ago

I also encountered the same problem, did you solve it?

pantmal commented 3 months ago

No, unfortunately, I am still investigating this error. What I was able to find thus far is that SONiC is supposed to have an openconfig path of: '/components/component/integrated-circuit/config/node-id'. This ID can be set via gNMI. Also, in the redis database, there is also supposed to be a key of: CONFIG_DB:NODE_CFG|integrated_circuit0.

This means that a SONiC switch with a device ID that has been set, should have these above as non-empty values. But in my case, both of these are empty (Actually, this openconfig path and the redis key don't even exist). So apparently, SONiC needs a config sorts that enables these values, but I haven't been able to find any example yet.

Hope this information might help you in your search. If you happen to find anything useful, let me know.

shengminhe commented 3 months ago

No, unfortunately, I am still investigating this error. What I was able to find thus far is that SONiC is supposed to have an openconfig path of: '/components/component/integrated-circuit/config/node-id'. This ID can be set via gNMI. Also, in the redis database, there is also supposed to be a key of: CONFIG_DB:NODE_CFG|integrated_circuit0.

This means that a SONiC switch with a device ID that has been set, should have these above as non-empty values. But in my case, both of these are empty (Actually, this openconfig path and the redis key don't even exist). So apparently, SONiC needs a config sorts that enables these values, but I haven't been able to find any example yet.

Hope this information might help you in your search. If you happen to find anything useful, let me know.

Thank you for the tip, I will continue to investigate this error

kperumalbfn commented 3 months ago

@baxia-lan could you please check this issue with P4RT and VS.

baxia-lan commented 3 months ago

@rhalstea Could you please take a look? Is node id in CONFIG required for p4rt boot up? Is it required to be set by gNMI via the path /components/component/integrated-circuit/config/node-id? Or it can be optional when starting p4rt app?

"NODE_CFG": {
        "integrated_circuit0": {
            "node-id": "1"
        }

cc. @mkeda

rhalstea commented 3 months ago

It's not required for bootup, but it is required if you want to use P4RT in any meaningful way. It looks like you have all the correct fields which p4rt_app monitors. You don't have to go through gNMI. The p4rt_app just needs to see the value in redis. config_db.json or manually updating should also work if you just want to test something.

baxia-lan commented 3 months ago

Thanks @rhalstea

We probably need to add this default NODE_CFG config to int_cfg.json.j2 then. https://github.com/sonic-net/sonic-buildimage/blob/master/files/build_templates/init_cfg.json.j2

baxia-lan commented 2 months ago

Discussed this issue offline with @rhalstea and @ndas7(gNMI team)

CONFIG DB NODE_CFG|integrated_circuit0 entry node_id field is to configure the unique identifier(a device id) per switch. The unique id is used by controller to identify the device to talk to. See P4Runtime Specs for details.

The expected behavior is that switch boots up with initial config(node id can be missing) and p4rt app might complain the node id missing; then config with node id is pushed, p4rt app(ConfigDbNodeCfgTableEventHandler) will consume the node id and unblock the remaining.

The node id should be configured by users who enable P4RT app by setting OpenConfig path /components/component/integrated-circuit/config/node-id value, i.e. setting CONFIG DB NODE_CFG|integrated_circuit0 entry node_id field value.

The path /components/component/integrated-circuit/config/node-id is in public OpenConfig(link), and the support on the gNMI/UMF side, the platforms xfmr is in progress to submit to the community.

So for general use case. the action item is on gNMI to submit the implementation, and users to configure the node-id before using p4rt. For tests only using standalone VirtualSwitch, e.g. docker-sonic-vs.gz for SWSS component test, a default node id can be hard coded in init_cfg.json.j2(e.g. PR 19534).

pantmal commented 2 months ago

So, if I'm understanding correctly, you propose updating only the docker-sonic-vs.gz with a hardcoded node-id value, and for other sonic-vs setups you suggest that the user should set the Device ID on his own. I'm fine with this, though in my opinion this should be mentioned on a README, or somewhere in the repo.

Now while I'm able to connect to the Virtual Switch by setting the node-id value using redis-cli, I'm having trouble pushing the p4 config files to the Virtual Switch, resulting in an odd error. I would appreciate some help in the new issue I have posted: https://github.com/sonic-net/sonic-buildimage/issues/19589

baxia-lan commented 2 months ago

Our way to set the node-id is through gNMI OpenConfig path. As mentioned the implementation is WIP(thanks @ndas7 for the input). And the OpenConfig will eventually be translated to CONFIG DB NODE_CFG table entry.

Also agreed that P4RT can add README for better instruction and documentation.