srl-labs / containerlab

container-based networking labs
https://containerlab.dev
BSD 3-Clause "New" or "Revised" License
1.55k stars 265 forks source link

SROS partial config doesn't load #2136

Closed omron93 closed 3 months ago

omron93 commented 4 months ago

After updating containerlab to latest version I'm getting issue when starting SROS node

    edge2-xd1-dev1:
      kind: vr-sros
      mgmt-ipv4: 192.168.251.3
      image: registry.srlinux.dev/pub/vr-sros:22.10.R2
      type: >-
        cp: uuid=dc7808cc-8161-4af3-bdba-2837153c3ab3 cpu=4 min_ram=6 chassis=sr-7s slot=A sfm=sfm-s card=cpm2-s ___
        lc: cpu=4 min_ram=6 chassis=sr-7s slot=1 max_nics=36 sfm=sfm-s card=xcm-7s mda/1=s36-400gb-qsfpdd level cr4800g
      license: ../licenses/sros-license.txt
      startup-config: ../startup-cfgs/vr-sros-base.partial.txt
      binds:
        - ../startup-cfgs/vr-sros-bof.txt:/tftpboot/vr-sros-bof.txt

This is what I can see in the clab command output

DEBU[0086]    DEBUG channel read "INFO: CLI #2060: Entering exclusive configuration mode" 
DEBU[0086]    DEBUG channel read "\nINFO: CLI #2061: Uncommitted changes are discarded on configuration mode exit" 
DEBU[0086]    DEBUG channel read "\n\n(ex)[/]\nA:admin@edge2-xd1-dev1# " 
DEBU[0086]    DEBUG channel GetPrompt requested 
DEBU[0086]    DEBUG channel write "\n"         
DEBU[0086]    DEBUG channel read "\n"          
DEBU[0086]    DEBUG channel read "\n(ex)[/]\nA:admin@edge2-xd1-dev1# " 
DEBU[0086]    DEBUG AcquirePriv determined no privilege action necessary 
  commit /admin save /exit all quit-config]' w"]cE9DIIW7/nb.gfc.SXXMdq96Sp6D8kqAXsAihbv2V0ex6Pi"
' BU[0086]     INFO SendCommand requested, sending 'configure {
' BU[0086]    DEBUG channel SendInput requested, sending input 'configure {
DEBU[0086]    DEBUG channel write "configure {\r" 
DEBU[0086]    DEBUG channel read "c"           
DEBU[0086]    DEBUG channel read "onf"         
DEBU[0086]    DEBUG channel read "igure {"     
DEBU[0086]    DEBUG channel read "\n"          
DEBU[0086]    DEBUG channel read "\n(ex)[/configure]\nA:admin@edge2-xd1-dev1# " 
DEBU[0091] CRITICAL channel timeout sending input to device 
ERRO[0091] failed to run postdeploy task for node edge2-xd1-dev1: failed to apply config; error: errTimeoutError: channel timeout sending input to device 
DEBU[0091] StateChange: Done -> edge2-xd1-dev1 - configure 
DEBU[0091] Worker 5 terminating...   

This seems to be specific to partial config loading. If I take sros node with default config and I manually apply our partial config to it, then when using the result config as "full" config in clab the node starts fine.

When trying different versions of containerlab, the last working version was 0.47.2 and versions >=0.48.0 are failing.

Also in docker logs I can see following failure, I don't if that's related.

2024-07-09 14:31:49,240: vrnetlab   TRACE    read from serial console:  /bof save 
Writing configuration to cf3:/bof.cfg ... OK
Completed.
*A:edge2-xd1-dev1#
2024-07-09 14:31:49,240: vrnetlab   DEBUG    writing to serial console: /admin save
2024-07-09 14:31:49,240: vrnetlab   TRACE    waiting for '#' on serial console
2024-07-09 14:31:49,464: vrnetlab   TRACE    read from serial console:  /admin save 
Writing configuration to tftp://172.31.255.29/config.txt
Saving configuration .

MINOR: CLI Unable to close configuration file.
*A:edge2-xd1-dev1#

I've found those issues which look similar: https://github.com/srl-labs/containerlab/issues/1829 https://github.com/carlmontanari/scrapli/issues/23 but the workarounds used in those issues doesn't work for me.

hellt commented 4 months ago

Hi we will need an example partial config that is reproducible to check that.

Also you may want to rebuild the image using the latest hellt/vrnetlab code using the script that extracts the qcow from the container https://github.com/hellt/vrnetlab/tree/master/sros#extracting-qcow2-disk-image-from-a-container-image

jk2lx commented 4 months ago

vr-sros-base.partial.txt

This is the partial cfg we use (passwords removed), regarding using different images that is sth we'll consider but the same SR-OS 22.10 image worked when using with clab < 0.48 so we wanted to investigate as well if there is a regression in clab.

hellt commented 4 months ago

@omron93 @jk2lx I created this topology and tested with clab 0.56.0

name: linx

topology:
  nodes:
    sros:
      kind: nokia_sros
      image: registry.srlinux.dev/pub/vr-sros:22.10.R2
      type: >-
        cp: uuid=dc7808cc-8161-4af3-bdba-2837153c3ab3 cpu=4 min_ram=6 chassis=sr-7s slot=A sfm=sfm-s card=cpm2-s ___
        lc: cpu=4 min_ram=6 chassis=sr-7s slot=1 max_nics=36 sfm=sfm-s card=xcm-7s mda/1=s36-400gb-qsfpdd level cr4800g
      startup-config: sros.partial.cfg
      license: lic.txt

took 22.10.R2 qcow2 and built an image using latest the master branch of hellt/vrnetlab (commit d8cffab2699bb2e69b3d350e9a34a65ee4452ea1)

Your partial config that you attached did not work for me, maybe because of the removed parts. The error during commit was:

DEBU[0120]    DEBUG channel read "MINOR: MGMT_CORE #236: configure system security user-params local-user user \"adm\" password - Missing mandatory fields\nMINOR: MGMT_CORE #236: configure system security aaa remote-servers tacplus server 1 secret - Missing mandatory fields\n\n*(ex)[/]\nA:admin@sros# " 

So I had to remove the local user part and the tacacs part. The resulting startup config became:

configure {
    card 1 {
        card-type xcm-7s
        mda 1 {
            mda-type s36-400gb-qsfpdd
            level cr4800g
        }
        fp 1 {
        }
        fp 2 {
        }
        fp 3 {
        }
        fp 4 {
        }
    }
    chassis router chassis-number 1 {
        power-shelf 1 {
            power-shelf-type ps-a10-shelf-dc
            power-module 1 {
                power-module-type ps-a-dc-6000
            }
            power-module 2 {
                power-module-type ps-a-dc-6000
            }
            power-module 3 {
                power-module-type ps-a-dc-6000
            }
            power-module 4 {
                power-module-type ps-a-dc-6000
            }
            power-module 5 {
                power-module-type ps-a-dc-6000
            }
            power-module 6 {
                power-module-type ps-a-dc-6000
            }
            power-module 7 {
                power-module-type ps-a-dc-6000
            }
        }
        power-shelf 2 {
            power-shelf-type ps-a10-shelf-dc
            power-module 1 {
                power-module-type ps-a-dc-6000
            }
            power-module 2 {
                power-module-type ps-a-dc-6000
            }
            power-module 3 {
                power-module-type ps-a-dc-6000
            }
            power-module 4 {
                power-module-type ps-a-dc-6000
            }
        }
    }
    sfm 1 {
        sfm-type sfm-s
    }
    sfm 2 {
        sfm-type sfm-s
    }
    sfm 3 {
        sfm-type sfm-s
    }
    sfm 4 {
        sfm-type sfm-s
    }
    system {
        # boot-good-exec "tftp://172.31.255.29/vr-sros-bof.txt"
        management-interface {
            configuration-mode model-driven
            netconf {
                admin-state enable
                auto-config-save true
                capabilities {
                    writable-running true
                }
            }
        }
        security {
            aaa {
                local-profiles {
                    profile "admin-ro" {
                        default-action deny-all
                        entry 10 {
                            match "admin show"
                            action permit
                        }
                        entry 20 {
                            match "admin tech-support"
                            action permit
                        }
                        entry 30 {
                            match "show"
                            action permit
                        }
                        entry 40 {
                            match "telnet"
                            action permit
                        }
                        entry 50 {
                            match "ping"
                            action permit
                        }
                        entry 60 {
                            match "traceroute"
                            action permit
                        }
                        entry 70 {
                            match "ssh"
                            action permit
                        }
                        entry 80 {
                            match "back"
                            action permit
                        }
                        entry 90 {
                            match "top"
                            action permit
                        }
                        entry 100 {
                            match "exit"
                            action permit
                        }
                        entry 110 {
                            match "file list"
                            action permit
                        }
                        entry 120 {
                            match "file show"
                            action permit
                        }
                        entry 130 {
                            match "file copy"
                            action permit
                        }
                        netconf {
                            base-op-authorization {
                                kill-session true
                                lock true
                            }
                        }
                    }
                    profile "admin-rw" {
                        default-action permit-all
                        netconf {
                            base-op-authorization {
                                action true
                                cancel-commit true
                                close-session true
                                commit true
                                copy-config true
                                create-subscription true
                                delete-config true
                                discard-changes true
                                edit-config true
                                get true
                                get-config true
                                get-data true
                                get-schema true
                                kill-session true
                                lock true
                                validate true
                            }
                        }
                    }
                }
                user-template tacplus-default {
                    profile "admin-rw"
                    access {
                        netconf true
                    }
                }
            }
            user-params {
                attempts {
                    count 5
                    lockout 1
                }
                authentication-order {
                    order [tacplus local]
                }
            }
        }
    }
}

This topology and startup config worked just fine.

DEBU[0120]     INFO SendCommand requested, sending 'quit-config' 
DEBU[0120]    DEBUG channel SendInput requested, sending input 'quit-config' 
DEBU[0120]    DEBUG channel write "quit-config" 
DEBU[0120]    DEBUG channel read "qu"          
DEBU[0120]    DEBUG channel read "i"           
DEBU[0120]    DEBUG channel read "t-"          
DEBU[0120]    DEBUG channel read "config"      
DEBU[0120]    DEBUG channel write "\n"         
DEBU[0120]    DEBUG channel read "\n"          
DEBU[0120]    DEBUG channel read "INFO: CLI #2064: Exiting exclusive configuration mode" 
DEBU[0120]    DEBUG channel read "\n\n[/]\nA:admin@sros# " 
DEBU[0120] StateChange: Done -> sros - configure        
DEBU[0120] Worker 0 terminating...                      
DEBU[0120] Exported topology data using /etc/containerlab/templates/export/auto.tmpl template 
DEBU[0120] Filter key: name, filter value: ^clab-linx-sros$ 
INFO[0120] Adding containerlab host entries to /etc/hosts file 
DEBU[0120] Filter key: name, filter value: ^clab-linx-sros$ 
INFO[0120] Adding ssh config for containerlab nodes     
+---+----------------+--------------+-------------------------------------------+------------+---------+-----------------+----------------------+
| # |      Name      | Container ID |                   Image                   |    Kind    |  State  |  IPv4 Address   |     IPv6 Address     |
+---+----------------+--------------+-------------------------------------------+------------+---------+-----------------+----------------------+
| 1 | clab-linx-sros | 81e2ec5e5142 | registry.srlinux.dev/pub/vr-sros:22.10.R2 | nokia_sros | running | 172.20.20.10/24 | 2001:172:20:20::a/64 |
+---+----------------+--------------+-------------------------------------------+------------+---------+-----------------+----------------------+