hashicorp / vagrant

Vagrant is a tool for building and distributing development environments.
https://www.vagrantup.com
Other
26.19k stars 4.43k forks source link

Guest OS detection is broken for Cisco Nexus switch OS guests #11762

Open y1y123 opened 4 years ago

y1y123 commented 4 years ago

I need to add guest detection and capability plugin for Cisco Nexus switch OS. Wrote the guest plugin but it was never called, Nexus OS is detected as "mint". Once this issue is resolved, I intend to commit the guest detection plugin to vagrant git. Contents of /etc/os-release on Nexus OS.

Nexus9000v# run bash bash-4.4$ cat /etc/os-release ID=nexus ID_LIKE=cisco-nxlinux NAME=Nexus VERSION="9.4(1)IJB9(0.192)" VERSION_ID="9.4(1)IJB9" PRETTY_NAME="Nexus 9.4(1)IJB9" HOME_URL=http://www.cisco.com BUILD_ID=192 CISCO_RELEASE_INFO=/etc/os-release

Vagrant version

2.2.7

Host operating system

Mac OS Catalina

Guest operating system

Cisco Nexus (a minor variant of linux whose default shell is NXOS CLI). config.ssh.shell = "run bash" #executes bash from NXOS CLI

Vagrantfile -

The config below is used only for reproduction of the issue. This config triggers, detection of guest OS and capabilities.

config.vm.box = "nexus9300v.9.4.1.IJB9.0.192.box" #Cisco Datacenter switch OS

Executes bash from NXOS CLI

config.ssh.shell = "run bash" config.ssh.insert_key = false config.vm.network "public_network"

config.vm.synced_folder ".", "/vagrant", disabled: true

Debug output

https://gist.github.com/y1y123/383e9dbcd5615f8bba5fc2d1706faa11

DEBUG guest: Trying: mint DEBUG ssh: Re-using SSH connection. INFO ssh: Execute: if test -r /etc/os-release; then source /etc/os-release && test 'xLinux Mint' = "x$ID" && exit fi if test -x /usr/bin/lsb_release; then /usr/bin/lsb_release -i 2>/dev/null | grep -qi 'Linux Mint' && exit fi if test -r /etc/issue; then cat /etc/issue | grep -qi 'Linux Mint' && exit fi exit 1 (sudo=false) DEBUG ssh: stderr: 41e57d38-b4f7-4e46-9c38-13873d338b86-vagrant-ssh DEBUG ssh: Exit status: 0 INFO guest: Detected: mint! DEBUG guest: Searching for cap: configure_networks

Expected behavior

What should have happened? Vagrant should have called all guest plugins one by one until one of them detects a valid distro. Since I added a new nexus guest plugin, it should have been called for it to detect Nexus OS but it was never called.

Actual behavior

The very first guest plugin (Linux mint) called by Vagrant wrongly detects Nexus as Mint (see debugs above).

Steps to reproduce

  1. Download Nexus vbox from here. https://software.cisco.com/download/home/286312239/type/282088129/release/9.3(4)

  2. Use above Vagrantfile

  3. vagrant up --debug

References

None

briancain commented 4 years ago

Hey there @y1y123 - How did you install Vagrant, and your guest plugin? From the debug output, you should have way more guest and host plugins than it's showing. It seems like there's something off there. For example, with the latest version of vagrant you should have about 37 guest plugins that get registered, where as with the debug output here I only see 6.

Also, if this code is successfully executing:

DEBUG guest: Trying: mint
DEBUG ssh: Re-using SSH connection.
 INFO ssh: Execute: if test -r /etc/os-release; then
source /etc/os-release && test 'xLinux Mint' = "x$ID" && exit
fi
if test -x /usr/bin/lsb_release; then
/usr/bin/lsb_release -i 2>/dev/null | grep -qi 'Linux Mint' && exit
fi
if test -r /etc/issue; then
cat /etc/issue | grep -qi 'Linux Mint' && exit 
fi
exit 1
 (sudo=false)
DEBUG ssh: stderr: 41e57d38-b4f7-4e46-9c38-13873d338b86-vagrant-ssh
DEBUG ssh: Exit status: 0

Then that seems like a problem with the box you are using. Can you confirm that the other files contents don't have linux mint in them either? I.e. te files /usr/bin/lsb_release and /etc/issue.

y1y123 commented 4 years ago

Thanks for looking into it Briancain. Sorry to confuse you, my bad.

Let me explain the context, since nexus plugin was not being because of above issue, I moved all other guest plugins to a backup directory and kept only one plugin i.e. "nexus" just to see if my plugin is called or not. It was indeed called and I was able to successfully test nexus plugin code.

--> /opt/vagrant/embedded/gems/2.2.7/gems/vagrant-2.2.7/plugins/guests/backup

After testing was over, I moved only few plugins from "backup" to guest directory and collected logs for reporting this issue. This is why you see only few guests in debug logs. If you prefer, I can move all the plugins back and collect logs again.

To your second question, I verified that code pointed by you is successfully executed on nexus vbox. One more data point is, when I removed only "mint" guest plugin directory, "atomic" was called and this time "atomic" showed match --> INFO guest: Detected: atomic!

Basically, it does not matter which guest plugin is called, whichever is called first shows the match.

Please refer output of "cat /etc/os-release" below. Also, note that nexus default shell is NXOS CLI and to overcome that I have configured config.ssh.shell = "run bash" in Vagrantfile. I might be wrong but my sense is before the code snippet pointed by you is executed on the switch, "run bash" is called and output of "run bash" is being checked instead of checking exit values of the code.

Nexus9000v# run bash bash-4.4$ cat /etc/os-release ID=nexus ID_LIKE=cisco-nxlinux NAME=Nexus VERSION="9.4(1)IJB9(0.192)" VERSION_ID="9.4(1)IJB9" PRETTY_NAME="Nexus 9.4(1)IJB9" HOME_URL=http://www.cisco.com BUILD_ID=192 CISCO_RELEASE_INFO=/etc/os-release

briancain commented 4 years ago

Hey there @y1y123 - so it seems like it's not possible at the moment for Vagrant to really manage this guest like other linux guests. This is due to the different file system that NX-OS has.

This doc https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/cli/nx/cfg/b_APIC_NXOS_CLI_User_Guide/b_APIC_NXOS_CLI_User_Guide_chapter_010.html seems to indicate that you should be able to run bash commands with bash -c "command goes here" or by just typing bash. But it doesn't seem like that fixes anything for the box you're using? The default shell vagrant uses is just bash -l, and it doesn't sound like that is working either for that box.

I also found this blog post: https://blogs.cisco.com/developer/open-nx-os-journey-1 It seems like they just recommend letting Vagrant fail when it gets past the initial boot up stage and then just logging in with vagrant ssh.

For now I've made an enhancement request to get better support for these kinds of guests in Vagrant for the future: https://github.com/hashicorp/vagrant/issues/11771 Feel free to drop any other useful info there if you wish!

y1y123 commented 4 years ago

Hey Brian, Cisco doc link you are referring to is for a different version, "NXOS ACI Fabric OS" and this doc is not applicable for the version I am talking about. I reported this issue for virtual box of "NXOS standalone" version. But yes, even for standalone version, one can drop to bash prompt by typing "run bash" on NXOS CLI. This is why I specify config.ssh.shell = "run bash" in my Vagrantfile.

Once we are in bash shell then it is pretty much standard linux with some minor changes like in /etc/os-release. There are lot of other Kernel loadable modules specific to nexus but that should not matter for standard linux operations.

As mentioned earlier, once I removed all other guest plugins, I was able to run nexus plugin for mounting the shared folder. The reason I had to write my own plugin was, standard linux plugin /opt/vagrant/embedded/gems/2.2.7/gems/vagrant-2.2.7/plugins/guests/linux/cap/mount_virtualbox_shared_folder.rb uses machine.communicate.sudo() at many places and all these commands fail when executed on nexus. As an example, it runs machine.communicate.sudo("mkdir -p #{guest_path}")

This and all other commands which are run with sudo fail but when I changed them to machine.communicate.execute(), they all worked fine. I prepended "sudo" before the commands, something like machine.communicate.execute("sudo mkdir -p #{guest_path}"). With this all commands worked fine and I was able to mount the shared folder. I am not sure why machine.communicate.sudo() fails on nexus but I am guessing that probably it is trying to run commands as "sudo run bash" and this is not a valid command on nexus.

My only issue now is that nexus plugin is not being called. If that is fixed, I can commit this plugin or we have to somehow make machine.communicate.sudo() work on nexus....thanks.

y1y123 commented 4 years ago

Hey Brian, I see that you have made this request as an enhancement. Any idea, when this enhancement will be taken up? From the subject, it seems the detection logic will be fixed and after I will need to commit nexus specific guest capability plugins as discussed above, please confirm. I can plan accordingly..thanks for your help.

briancain commented 4 years ago

Hey @y1y123 - I'm not sure when this might be worked on, and currently the team working on Vagrant is very small. But I think the issue here is that no matter what, setting run bash will always exit 0 regardless of the result of the script it runs, which is a problem here for Vagrant (and not only for determining what guest it is). If that's the only thing stopping the guest to work though then :+1:

This and all other commands which are run with sudo fail but when I changed them to machine.communicate.execute(), they all worked fine. I prepended "sudo" before the commands, something like machine.communicate.execute("sudo mkdir -p #{guest_path}"). With this all commands worked fine and I was able to mount the shared folder. I am not sure why machine.communicate.sudo() fails on nexus but I am guessing that probably it is trying to run commands as "sudo run bash" and this is not a valid command on nexus.

You might be able to get around this by making the sudo command empty: https://www.vagrantup.com/docs/vagrantfile/ssh_settings#config-ssh-sudo_command

config.vm.sudo_command = ""
# Or you might just need to make it %c which is where the original command goes:
config.vm..sudo_command = "%c"

Then you shouldn't need your own plugin for the sake of removing the sudo commands for the machines communicator.

y1y123 commented 4 years ago

Agree, I would prefer a solution for the sudo command instead of writing on own plugin. Nexus specific plugin and capabilities gives more flexibility though. Anyway, I tried to configure config.vm.sudo_command = "" and config.vm.sudo_command = "%c" but I get following error. % vagrant up
Bringing machine 'default' up with 'virtualbox' provider..

There are errors in the configuration of this machine. Please fix the following errors and try again:

vm:

briancain commented 4 years ago

@y1y123 - oops, sorry. I had written it wrong. It should be config.ssh.sudo_command, not vm. That should be right!

y1y123 commented 4 years ago

Thanks, the error goes away with config.ssh.sudo_command ="" and also with "%c" but now commands which require root access or sudo fail. I also tried config.ssh.sudo_command = "sudo", and "sudo %c", hoping that "sudo" will be prefixed but now even non-root commands also fail. Is there anyway to prefix "sudo" with commands which require root access?