hashicorp / packer-plugin-vsphere

Packer plugin for VMware vSphere Builder
https://www.packer.io/docs/builders/vsphere
Mozilla Public License 2.0
97 stars 93 forks source link

Unable to build ISO for Ubuntu 20.04.3 using Packer #106

Closed vesubramanian closed 1 year ago

vesubramanian commented 3 years ago

Ubuntu Version : 20.04.3 (Focal) Packer Version: 1.7.4 vSphere Client Version: 6.7.0.48000 Builder Type: vSphere-iso

I have tried so many options and articles, but I couldn't build an ISO for Ubuntu 20.04.3. I get stuck at "Waiting for SSH server to become available". It will be great, if I can get a working example. Especially, I need the builder section inside ubuntu2004.json and the content of the user-data file which will be inside the http directory. It would be really nice if there is a step-by-step example, as I am not at all knowledgeable on Linux. Please help. I am also tagging @dbond007.

PFB a few links I followed, but didn't have any luck. https://github.com/dbond007/Packer/tree/master/ubuntu_base https://github.com/rainpole/packer-vsphere https://virtjo.com/2020/build-ubuntu-vm-with-packer-on-vsphere/

Upon monitoring the console, I had some observations. I am attaching some screen shots for reference. auto-install-prompt error

tenthirtyam commented 3 years ago

You linked my repo - https://gitHub.com/rainpole/packer-vsphere - have you tried the HCL based examples provided?

Ryan

tenthirtyam commented 3 years ago

It does however look like the guest OS is failing to install and I'd suggest reviewing the user-data configuration.

I generate mine on demand m but check out https://github.com/rainpole/packer-vsphere/blob/main/builds/linux/ubuntu-server-20-04-lts/http/user-data.pkrtpl.hcl.

Ryan

dbond007 commented 3 years ago

@tenthirtyam I pointed @vesubramanian to yours and my repo as examples in another case and suggested opening a new one here.

@vesubramanian what is your boot command set to in packer?

Try posting you user-data file here also, try to keep the formatting.

vesubramanian commented 3 years ago

I tried the boot commands and user data from all 3 above. Of course, I used the JSON version for ubuntu2004 file instead of .hcl version, since I am more comfortable with json. Will it make a difference? Anyways, I am sharing the builder section and user-data below.

`#cloud-config autoinstall: version: 1 locale: en_US keyboard: layout: en variant: us network: network: version: 2 ethernets: ens192: dhcp4: true storage: layout: name: lvm identity: hostname: my-ubuntu username: runner password: (generated using "printf 'runner' | openssl passwd -6 -salt 'myPwd' -stdin" command on Ubuntu) ssh: install-server: yes allow-pw: true authorized-keys:

"builders": [       
     {
      "type": "vsphere-iso",
      "vcenter_server": "myVCServerName",      
      "username": "myVCUser@vsphere.local",      
      "password": "myVCPwd",
      "insecure_connection": "true",    
      "datacenter": "myDC",
      "cluster": "myCluster/myCluster",
      "datastore": "datastore1",  
      "guest_os_type": "ubuntu64Guest",
      "CPUs": 1,
      "RAM": 1024,
      "RAM_reserve_all": false,
      "disk_controller_type": "pvscsi",
      "storage": {
        "disk_size": 15000,
        "disk_thin_provisioned":true
      },
      "network_adapters": {
        "network": "myNetworkName", 
        "network_card": "vmxnet3"
      },
      "vm_name": "my-ubuntu",      
      "notes": "Built via Packer",            
      "convert_to_template": true,
      "ssh_username": "runner",      
      "ssh_password": "myPwd",
      "ssh_timeout": "20m",
      "ssh_handshake_attempts": "100",
      "iso_paths": ["[datastore1] ISO/ubuntu-20.04.3-live-server-amd64.iso"], 
      "cd_files": ["{{template_dir}}/http/user-data", "{{template_dir}}/http/meta-data"],
      "cd_label": "cidata",
      "boot_wait": "2s",
      "boot_command": [
        "<enter><wait2><enter><wait><f6><esc><wait>",
        " autoinstall<wait2> ds=nocloud;",
        "<wait><enter>"
      ],
      "shutdown_command": "echo 'runner'|sudo -S shutdown -P now"     
     }        
    ]
vesubramanian commented 3 years ago

Also, it will be great, if you can please let me know the commands to generate the encrypted password and also the public SSH key. I am not sure whether I am generating these properly or not.

tenthirtyam commented 3 years ago

Password: openssl passwd -6

Public Key: ssh-keygen -t ecdsa -b 521

dbond007 commented 3 years ago

Just quickly looking over, may be wrong (cant see the formatting), but your user-data has "network: network:" (should be 1) The network section should be like:

network:
 version: 2
 ethernets:
  ens192:
   dhcp4: true

Also with your choice of using json, you may wish to change that to HCL, HCL is the preferred language from 1.7.

vesubramanian commented 3 years ago

Password: openssl passwd -6

Public Key: ssh-keygen -t ecdsa -b 521

Thank you very much. Will try this. In the above command for Password, "passwd" is the sample password, right?

dbond007 commented 3 years ago

@vesubramanian "passwd" isnt the sample, its the command to be executed.

openssl passwd -6 YOUR_PASSWORD

vesubramanian commented 3 years ago

Just quickly looking over, may be wrong (cant see the formatting), but your user-data has "network: network:" (should be 1) The network section should be like:

network:
 version: 2
 ethernets:
  ens192:
   dhcp4: true

Also with your choice of using json, you may wish to change that to HCL, HCL is the preferred language from 1.7.

Thanks a lot for your response. Will check this. However, the JSON that I am trying to use is from a different github repository, which has a big provisioner section already defined in json. I am not sure how to convert this to hcl.

dbond007 commented 3 years ago

To convert from json to HCL: https://learn.hashicorp.com/tutorials/packer/hcl2-upgrade?in=packer/configuration-language

vesubramanian commented 3 years ago

I am actually trying to build a self hosted github linux runner, following the below link. However, they are using packer and building Azure Image. In my case, I am trying to build a vSphere-ISO. They are also using json. https://github.com/actions/virtual-environments/blob/main/images/linux/ubuntu2004.json

So, I am using that same json with the same provisioners section and changing only builders section. Is there any issue using json? Please don't misunderstand. I am just curious.

vesubramanian commented 3 years ago

Just quickly looking over, may be wrong (cant see the formatting), but your user-data has "network: network:" (should be 1) The network section should be like:

network:
 version: 2
 ethernets:
  ens192:
   dhcp4: true

Also with your choice of using json, you may wish to change that to HCL, HCL is the preferred language from 1.7.

Fixed it, but no difference.

dbond007 commented 3 years ago

Is there any issue using json? Please don't misunderstand. I am just curious.

There is no issue, its just that if you are starting now and do not have a load of code already done, HCL is probably where you should start as json is not the preferred / developed language anymore for packer. But if you are wanting to just get something working and know that it will have limitations in the future possibly requiring a rewrite, then ok, it's just something to keep in mind.

Now with the problem.

I haven't had a chance to test yet, but what would make it easier is if you could put what you are using in your github and then we can test it out.

vesubramanian commented 3 years ago

Let me explain all the details. I have an Ubuntu 20.04.3 VM. In that I am performing the following steps. Login as root and create a user called runner useradd -m -p $(openssl passwd -crypt myPassword) runner && sudo usermod -a -G sudo runner && su - runner exec bash

Install Packer (https://computingforgeeks.com/how-to-install-and-use-packer/)

sudo apt update sudo apt -y install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add - sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main" sudo apt update sudo apt upgrade sudo apt install packer

Install Az PowerShell Module in Ubuntu (https://aster.cloud/2019/10/08/how-to-install-azure-powershell-on-ubuntu/)

wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb sudo dpkg -i packages-microsoft-prod.deb sudo apt-get update sudo add-apt-repository universe sudo apt-get install -y powershell pwsh Install-Module -Name Az -AllowClobber -Scope CurrentUser

Install Azure CLI (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt)

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Install Git

sudo apt install git

git clone https://github.com/actions/virtual-environments.git chmod 777 -R virtual-environments

On my Windows VM, I modify the ubuntu2004.json and created a folder called http. Inside http folder, created an empty meta-data file and user-data file. Updated the user-data file on Windows VM. (will provide the content of ubuntu2004.json and user-data file below). Path of ubuntu2004.json on GitHub is https://github.com/actions/virtual-environments/blob/main/images/linux/ubuntu2004.json Copied the locally modified files from Windows VM to the Ubuntu server using the below commands. pscp .\ubuntu2004-updated.json runner@IP-Address:/home/runner/virtual-environments/images/linux/ubuntu2004-updated.json pscp -r .\http runner@IP-Address:/home/runner/virtual-environments/images/linux

Then I run the following command. sudo packer build -force virtual-environments/images/linux/ubuntu2004-updated.json

ubuntu2004-updated.json.txt user-data.txt

Note: I have changed the extensions to txt to allow upload. Please change it at your end.

vesubramanian commented 3 years ago

Hope this clarifies. This is my requirement. GitHub is using Packer + Azure DevOps to build Azure Image. In my case, I need to build vSphere-ISO using packer. Please help.

dbond007 commented 3 years ago

First thing you need to do is in your user-data you need to change

network:
 version: 2
 ethernets:
  ens33:
   dhcp4: true

to

network:
 version: 2
 ethernets:
  ens192:
   dhcp4: true

this will allow the network setup to complete and ssh working. I haven't checked the provisioner section so there may be more. See if that works.

vesubramanian commented 3 years ago

Thank you. Tried it, but still the same error.

vesubramanian commented 3 years ago

Although the packer output waits for SSH to become available, when I look the Web Console of VSphere, the error (also seen in the above screen shot) is "command ['systemd-cat', '--level-prefix=false', '--identifier=subiquity_log.2014', '.snap/subiquity/2651/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install'] returned non-zero exit status 3.". Not sure what is causing this or how to fix this.

dbond007 commented 3 years ago

When I tested, I removed the provisoners. Changed the networking to the above, filled in my details needed, removed the data center value, cleaned up the indents, it then installed Ubuntu, network was setup, it installed the ssh server and open-vm-tools ran the post commands in the cloud init config and shutdown the server.

Before changing the network config, it was stuck waiting for ssh, which it would be as interface ens33 didn't exist on an esxi vm, its 192 for the first one.

Remove the provisoners, see if it then works. If not, when I get home, I will add a repo with what worked for me. You test, if it didn't work, there is something else that is a problem also, not related to packer or cloud init config.

vesubramanian commented 3 years ago

Should I remove the provisioners section itself? Is it not mandatory? Or I should have empty provisioners section? Can you please explain more on "cleaned up indents"? I didn't get it. How to check what ens to use?

Meanwhile, I will try and let you know.

dbond007 commented 3 years ago

You can remove it all, its not needed (for testing). The builders section is all that is needed to test the install / creation of the OS. Cleaning up the indents (did it in the packer file), making the indents all tabs instead of some spaces and some tabs. So consistent indentation.

The ens value, yeah, all you want to know is, its 192 for vmware. That value is based upon where the card is on the PCI(e) bus, port number etc, its the "predictable" naming scheme (deterministic naming scheme). It can be changed back to the old naming scheme so its eth0, but that can only be done after the install in grub (unless someone knows how to do it beforehand).

Edit: I see that rainpole/packer-vsphere uses ens33, I wonder has that changed for ESXi7 @tenthirtyam ? Or something else, 33 is a PCI location, 192 is a PCIe location (start at 160 I believe), so maybe ens33 is for the e1000 and 192 is for vmxnet3.

vesubramanian commented 3 years ago

Thank you once again. I tried both (with empty provisioners section and no provisioners section). Getting the same error. Removed the Datacenter value also. I tried with ens192 as well. No change. Also, the subiquity error is because of any of these or it is because of something else?

dbond007 commented 3 years ago

As a test, just to see if there is something else with the setup you have causing it. Have you tried one of the deploys that I have. To just see if it works.

vesubramanian commented 3 years ago

Sorry for asking. Did you already share that with me? Not sure if I missed it. If you don't mind, can you please share it again?

tenthirtyam commented 3 years ago

https://github.com/dbond007/Packer

vesubramanian commented 3 years ago

Thank you. I believe I need to use the file at https://github.com/dbond007/Packer/blob/master/ubuntu_base/variables.20.04.pkr.hcl

I won't need any user-data?

tenthirtyam commented 3 years ago

The user-data is seen in etc/http directory.

See: https://github.com/dbond007/Packer/blob/master/ubuntu_base/etc/http/user-data

vesubramanian commented 3 years ago

Thank you. Please excuse me for my ignorance. I copied the pkr.hcl file into the directory and also the user-data files into the appropriate directory. However, when I run the below command, it is not working. It keeps saying prompting with options. Not sure what I am doing wrong.

packer build -var-file=variables.20.04.pkr.hcl

Since I am running with a non root user, I even tried the above command with sudo, but no luck.

dbond007 commented 3 years ago

clone the repo. in the ubuntu_base rename the variables.pkrvars.hcl.example to variables.pkrvars.hcl in the variables.pkrvars.hcl fill in your correct details (leave the SSH details as thats whats in the user-data). the in the ubuntu_base directory run packer build -only="*20.04*" -var-file=variables.pkrvars.hcl .

The -only="20.04" will make it build only ubuntu 20.04 It will download the ISO from ubuntu automatically. it should take around 7 minutes (depending on the server its being built on) to complete.

vesubramanian commented 3 years ago

Couple of questions

  1. In your hcl file, what are VCenter (is it vSphere version), Zone and Environment variables? Also, is VCFolder is mandatory? Which values are mandatory?
  2. I am running this from Ubuntu machine with a non root user. So, I ran it with sudo. Hope it is fine.

However, after following all the steps and running the packer build command, I got this error.

Error: no plugin installed for github.com/hashicorp/vsphere 1.0.1

Did you run packer init for this project ?

dbond007 commented 3 years ago

All variable descriptions are in variables.common.pkr.hcl But for you specific answers to the above: "VCenter" is you vcenter server, its ip address or FQDN "VCFolder" that is the folder that you want the template to be stored in in vcenter. You can place them in different places, i put them in one called templates but you could probably just have double " if you do not want to put it in a specific folder (not tested). Zone and environment you can leave, they haven't been implemented yet as my feature request hasn't been implemented. Everything that is in that file needs some kind of value.

The last error is what it asked. I didnt specify as i forgot that I added it for another test i did for someone elses problem. Run packer init . It will then download the newer vsphere packer plugins. If you dont want to do that, just delete packer.plugins.pkr.hcl and it will use whats built into packer.

sudo shouldnt be needed to run any of this, it isnt making any changes to your system.

you need to change these values to fit your environment:

VCenter = "10.0.0.151"                                 :     your vcenter server ip or FQDN
VCCluster = "Cluster1"                                  :      you cluster name
VCUser = "administrator@vsphere.local"     :     your vcenter user
VCPassword = "N0tS3cUr3"                         :     your vcenter user password
VCDataStore = "NFS01"                               :     the datastore where you want the VM to be put
VCNetwork = "dv-LAN"                                 :     the network in vcenter you want the VM on
VCFolder = "templates"                                :      what folder to put the VM / template in
vesubramanian commented 3 years ago

Thank you very much once again. Followed all your steps. Looks like Zone is required. So, I left the last 2 variables and values, as is. Cloned the code on my Ubuntu server on to the non root user's home directory. Gave full permissions to the cloned folder. Modified the user-data (for the user name and encrypted password) and variables hcl files locally in my Windows VM and copied these 2 files to the appropriate locations inside the Ubuntu server using pscp command. Then ran the packer init and then packer build. Ran into the same issue (as shown / mentioned above). Gets stuck at "Waiting for SSH to become available". In the web console, I could see the same issue. "command ['systemd-cat', '--level-prefix=false', '--identifier=subiquity_log.2096', '.snap/subiquity/2651/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install'] returned non-zero exit status 3."

Does it have anything to do with any setting in vSphere or is it something else? Please help.

dbond007 commented 3 years ago

If you could run it without changing anything other than the above variables to ensure no other changes that you may do are causing that.

vesubramanian commented 3 years ago

Reverted the user-data file to match yours. However, in packer variables file, I have a couple of questions. The SSH User can be any user or only root? VSphere user is also a non-admin user, but has sufficient privileges to create VM. Is this fine?

Ok. Just finished running, following your advice. Still the same error.

dbond007 commented 3 years ago

The SSH user is the user you create in the user-data file, it should have root / sudo permissions if you want to do anything with it. The user in vsphere will not be causing this, you would get other errors with packer if it couldn't do what was needed. This is something specific to the installation of Ubuntu in your environment.

dbond007 commented 3 years ago

You may want to look here, nothing at the moment, but you appear to be having the same problem as here: https://discourse.ubuntu.com/t/issue-with-curtin-while-trying-to-autoinstall-ubuntu-20-04-focal/20046

vesubramanian commented 3 years ago

Thank you once again for your prompt response. I checked that already and I have asked for help (see the last one from venh123). Meanwhile, I have a question. The SSH user is created and given sudo permissions by the script, during execution of packer build? If not, which user and password to use?

I tried this today from Windows VM also (for your repository), but I am getting the same error. I have a few more questions.

  1. Is there a minimum version for vSphere for this to work (for Ubuntu 20.04.3)?
  2. If not, can any setting in VSphere cause this? I saw a similar issue reported here.

I am not a Network or OS expert. I somehow managed to convince them to create a user for me for this. Hence, I don't have an admin user for vSphere and don't have complete access.

vesubramanian commented 3 years ago

@SwampDragons Can you please help?

dbond007 commented 3 years ago

1.There isn't really a minimum version of vsphere for this to work, installing the OS. There is a minimum version for the customisation by vmware when deploying from the template.

  1. It could, but that would be a bug in Ubuntu.

To completely exclude packer, limiting it to vsphere and Ubuntu, create an iso with the cloudinit files on it and mount it with the os on vmware in a new vm and do the same thing as you got packer to do so it auto installs. It will likely fail again exactly the same. If it does, it a bug in Ubuntu being caused by something, which to find out more you wound need to look in the log files. If it works there is something that is happening with packet in your environment that we can't replicate.

vesubramanian commented 3 years ago

Thank you. I am not sure how to export the logs into my Windows VM so that I can look into it. I use Web Console. I have one more question. In some user-data files, I have seen the following line.

echo 'ubuntu ALL=(ALL) NOPASSWD:ALL' > /target/etc/sudoers.d/ubuntu

In the above line, "ubuntu" in both places is the user name?

dbond007 commented 3 years ago

Yes it is the username. Giving it sudo, with no password needed.

vesubramanian commented 3 years ago

Thank you. I have seen many blogs/articles now. Every one is using their own boot command and user-data file. Not sure why these are different for everyone and not consistent.

dbond007 commented 3 years ago

They are different because there are different ways to do the same thing. The user-data file can to a lot of things and again you can do the same thing in many ways. You could use it to do everything if you wanted instead of using the provisioners in packer. It's down to preference, workflows and knowledge.

The way I have done mine is based upon trial and error when I started as I couldn't find any examples that worked reliably and did what I wanted, so with what I learnt I ended up with what I have. There will be other ways, probably better, but I didn't know at the time.

vesubramanian commented 3 years ago

Ok. But what I am wondering is that there is no standard documentation. People can always customize according to their needs & convenience, but there should have been some basic documentation for people who just want to go with the basic settings and not tweak anything since they may not have expertise on Linux / Networking.

Anyway, can you please advise on how to get the log from the Web Console to my local Windows machine?

vesubramanian commented 3 years ago

Also, how can I ensure whether my user-data is getting called or not. I saw somewhere that if the user-data has issues, then also, this error occurs.

vesubramanian commented 3 years ago

I was able to figure out how to extract the crash report into my Windows VM. Also, I ran an interactive install. It was failing at "curtin command in-target" installation, although I am not sure about the reason. When I searched for it, I saw an article which suggested to disable network adapter before installation and re-enable it after installation. What if there is more than one? How to disable/enable all at once? Can this be done via user-data? Can you please help with this?

dbond007 commented 3 years ago

you should put the logs somewhere so I / we can look at them. With regards to the disabling the network, you could try sudo nmcli networking off in the early commands and sudo nmcli networking on in the late.

no idea if it will work.

But what that will do if it does work is it will make it so that it will not update and will not install open-vm-tools or openssh as the repos will not be available. So they will need to be added to the late commands after turning back on the network.

You should test if disabling the network will work, just disable it in vmware, if it installs after that then that is the problem for some reason, if not then this will unlikely help.

vesubramanian commented 3 years ago

Thank you once again. I tried with simply "sudo ip link set ens192 down" in early commands and "sudo ip link set ens192 up" in late commands. Although it helped in the interactive session with no other detail in the user-data, when I updated the user-data with more detail and removed the interactive, I started encountering the same error. Anyways, I am attaching the crash log. I tried going through it, but couldn't understand much. Hope you find something from it. However, when I searched for exception, I found the following

curtin: Installation failed with exception: Unexpected error while running command. Command: ['/snap/subiquity/2651/bin/subiquity-configure-apt', '/snap/subiquity/2651/usr/bin/python3', 'true'] Exit code: 100 Reason: - Stdout: + '[' -z /target ']'

  • PY=/snap/subiquity/2651/usr/bin/python3
  • HAS_NETWORK=true
  • /snap/subiquity/2651/usr/bin/python3 -m curtin apt-config

finish: cmd-install/stage-curthooks/001-configure-apt/cmd-in-target: FAIL: curtin command in-target curtin: Installation failed with exception: Unexpected error while running command. Command: ['/snap/subiquity/2651/bin/subiquity-configure-apt', '/snap/subiquity/2651/usr/bin/python3', 'true'] Exit code: 100 Reason: - Stdout: + '[' -z /target ']'

  • PY=/snap/subiquity/2651/usr/bin/python3
  • HAS_NETWORK=true
  • /snap/subiquity/2651/usr/bin/python3 -m curtin apt-config

ERROR root:39 finish: subiquity/Install/install: FAIL: Command '['systemd-cat', '--level-prefix=false', '--identifier=subiquity_log.2090', '/snap/subiquity/2651/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install']' returned non-zero exit status 3.

1631882672.063652039.install_fail.crash.txt

vesubramanian commented 3 years ago

Just quickly looking over, may be wrong (cant see the formatting), but your user-data has "network: network:" (should be 1) The network section should be like:

network:
 version: 2
 ethernets:
  ens192:
   dhcp4: true

Also with your choice of using json, you may wish to change that to HCL, HCL is the preferred language from 1.7.

If you check the network section in the documentation from ubuntu, using 2 network attributes is correct.