saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.18k stars 5.48k forks source link

Napalm net.cli race condition #55332

Open danfoster opened 4 years ago

danfoster commented 4 years ago

Description of Issue

If you run 2 parallel net.cli commands, the output can get mixed up. There is also a change it can break the napalm proxy-minion if the wrong process reads consumes the prompt.

In the example below I run the following two loops:

while [ true ]; do salt 'mlnx1.switch' net.cli "show version"; done
while [ true ]; do salt 'mlnx1.switch' net.cli "show int status"; done

This quickly reproduces the issue, but a more common, real world example is when the minion grains gets refreshed at the same time as applying a state.

In the above example, you would expect the first command to produce the following output:

            Product name:      Onyx
            Product release:   3.8.2004
            Build ID:          #1-dev
            Build date:        2019-09-23 14:19:47
            Target arch:       x86_64
            Target hw:         x86_64
            Built by:          jenkins@7ae5fd122b61
            Version summary:   X86_64 3.8.2004 2019-09-23 14:19:47 x86_64

            Product model:     x86onie
            Host ID:           B8599F0E752C
            System serial num: MT1917K00431
            System UUID:       946e383c-69ed-11e9-8000-b8599f63da40

            Uptime:            7d 23h 1m 28.512s
            CPU load averages: 3.01 / 3.05 / 3.07
            Number of CPUs:    4
            System memory:     2628 MB used / 5175 MB free / 7803 MB total
            Swap:              0 MB used / 0 MB free / 0 MB total

And the 2nd command:

           --------------------------------------------------------------------------------------------------------------------------------

            Port                Operational state   Admin                         Speed           Description
            --------------------------------------------------------------------------------------------------------------------------------
            mgmt0               Up                  Enabled                       1000Mb/s (auto) -
            Eth1/1              Down                Enabled                       Unknown         -
            Eth1/2              Down                Enabled                       Unknown         -
            Eth1/3              Down                Enabled                       Unknown         -
            Eth1/4              Down                Enabled                       Unknown         -
            Eth1/5              Down                Enabled                       Unknown         -
            Eth1/6              Down                Enabled                       Unknown         -
            Eth1/7              Down                Enabled                       Unknown         -
            Eth1/8              Down                Enabled                       Unknown         -
            Eth1/9              Down                Enabled                       Unknown         -
            Eth1/10             Down                Enabled                       Unknown         -
            Eth1/11             Down                Enabled                       Unknown         -
            Eth1/12             Down                Enabled                       Unknown         -
            Eth1/13             Down                Enabled                       Unknown         -
            Eth1/14             Down                Enabled                       Unknown         -
            Eth1/15             Down                Enabled                       Unknown         -
            Eth1/16             Down                Enabled                       Unknown         -
            Eth1/17             Down                Enabled                       Unknown         -
            Eth1/18             Down                Enabled                       Unknown         -
            Eth1/19             Up                  Enabled                       40G             -
            Eth1/20             Up                  Enabled                       40G             -
            Eth1/21             Down                Enabled                       Unknown         -
            Eth1/22             Down                Enabled                       Unknown         -

In the file attached, I show an example of the output of the proxy-minion in debug mode when the two commands are getting merged.

20191115154146169012 returns:

[DEBUG   ] minion return: {u'fun_args': [u'show version'], u'jid': u'20191115154146169012', u'return': {u'comment': u'', u'result': True, u'out': {u'show version':
    u'us\n--------------------------------------------------------------------------------------------------------------------------------\n\nPort                Operational state   Admin                         Speed
    Description                             \n--------------------------------------------------------------------------------------------------------------------------------\nmgmt0               Up
    Enabled                       1000Mb/s (auto) -\nEth1/1              Down                Enabled                       Unknown         -                                       \nEth1/2              Down
    Enabled                       Unknown         -                                       \nEth1/3              Down                Enabled                       Unknown         -                                       \nEth1/
    4              Down                Enabled                       Unknown         -                                       \nEth1/5              Down                Enabled
    Unknown         -                                       \nEth1/12             Down                Enabled                       Unknown         -                                       \nEth1/13             Down
    Enabled                       Unknown         -                                       \nEth1/14             Down                Enabled                       Unknown         -                                       \nEth1/
    15             Down                Enabled                       Unknown         -                                       \nEth1/16             Down                Enabled
    Unknown         -                                       \nEth1/17             Down                Enabled                       Unknown         -                                       \nEth1/18             Down
    Enabled                       Unknown         -                                       \n\n'}}, u'retcode': 0, u'success': True, u'fun': u'net.cli'}

20191115154146711930 returns:

[DEBUG   ] minion return: {u'fun_args': [u'show int status'], u'jid': u'20191115154146711930', u'return': {u'comment': u'', u'result': True, u'out': {u'show int status': u'
    Down                Enabled                       Unknown         -                                       \nEth1/7              Down                Enabled
    Unknown         -                                       \nEth1/8              Down                Enabled                       Unknown         -                                       \nEth1/9              Down
    Enabled                       Unknown         -                                       \nEth1/10             Down                Enabled                       Unknown         -                                       \nEth1/
    11             Down                Enabled                       Unknown         -                                       \nEth1/19             Up                  Enabled
    40G             -                                       \nEth1/20             Up                  Enabled                       40G             -                                       \nEth1/21             Down
    Enabled                       Unknown         -                                       \nEth1/22             Down                Enabled                       Unknown         -                                       \n'}},
    u'retcode': 0, u'success': True, u'fun': u'net.cli'}

Versions Report

# salt --versions-report
Salt Version:
           Salt: 2019.2.2

Dependency Versions:
           cffi: 1.12.3
       cherrypy: unknown
       dateutil: 2.7.2
      docker-py: 3.5.0
          gitdb: 2.0.5
      gitpython: 2.1.11
          ioflo: Not Installed
         Jinja2: 2.9.6
        libgit2: Not Installed
        libnacl: 1.6.1
       M2Crypto: 0.31.0
           Mako: 1.0.12
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: 1.2.5
      pycparser: 2.19
       pycrypto: 2.6.1
   pycryptodome: 3.7.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  7 2019, 00:51:29)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 2.0.5
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.7.1908 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-1062.1.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.7.1908 Core

Using the onyx napalm driver (https://github.com/napalm-automation-community/napalm-onyx), but I don't think this should make a difference?

napalm_bug.txt

Ch3LL commented 4 years ago

ping @saltstack/team-network any ideas here?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

danfoster commented 4 years ago

I'm still interested in how we can debug this further and hopefully solve it, so commenting to prevent the stalebot from closing this issue.

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

danfoster commented 4 years ago

I'm still interested in how we can debug this further (anti stale-bot comment)

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.