sandialabs / sceptre-phenix

phenix is an orchestration tool and GUI for Sandia's minimega platform
https://sandialabs.github.io/sceptre-docs/
GNU General Public License v3.0
17 stars 23 forks source link

Error when trying to include windows guests into an environment #166

Closed Narratot closed 7 months ago

Narratot commented 7 months ago

Situation description

When trying to run a Windows VM within a phenix environment I get the following Error:

Error: unable to start experiment helloworld

Cause: reading minimega script: reading mmcli script: 1 error occurred:  * [image /phenix/images/lap-schonerd_helloworld_test1_snapshot] cp: cannot create regular file '/tmp/minimega/dstImg790037620/phenix/startup/20-startup.ps1': No such file or directory

: exit status 1

Windows VM Setup

I used the following steps to create the image: qemu-img create -f qcow2 W10.qcow 50G

qemu-system-x86_64 -enable-kvm -m 12G -smp sockets=1,cores=12 -cpu host -hda W10.qcow -boot d -netdev user,id=net0,net=192.168.0.0/24,dhcpstart=192.168.0.9 -device virtio-net-pci,netdev=net0 -vga qxl -device AC97 -cdrom /home/lud77433/Downloads/Win10_22H2_EnglishInternational_x64v1.iso

Post installation:

Install current Virtio drivers (Using the ISO installer)

Create minimega startup task (with miniccc.exe build from minimega git) following the description in the minimega documentation (https://www.sandia.gov/minimega/module-28-miniccc-and-the-cc-api/)

activeshadow commented 7 months ago

@Narratot can you please confirm that your head node has the qemu-nbd command installed and the nbd kernel module is being loaded correctly? The /tmp/minimega/dstImg790037620 portion of your error message makes me think the phēnix-created snapshot of your Windows image (this snapshot is used to hold the phēnix injects) isn't being connected and/or mounted correctly as part of the inject process.

Narratot commented 7 months ago

@activeshadow the command is recognized by my machine. And the issue does only occur on windows guests. Any Linux guest (build with either the phenix image build command or build using provided packer commands) does work flawlessly.

activeshadow commented 7 months ago

Ahhh... okay I bet I know what the problem is. When you connect the Windows QCOW2 image manually with qemu-nbd, how many partitions are showing up? For example, /dev/nbd0p1, /dev/nbd0p2, etc? I suspect your Windows image has more than one partition and the first partition is not the main disk partition. It's likely partition 2.

To adjust for VM disk images with more than one partition, you can use the inject_partition option in the topology config. For example:

hardware:
  drives:
    - image: W10.qc2
    - inject_partition: 2
  os_type: windows
Narratot commented 7 months ago

The W10.qc2 drive has the following partitions:

nbd0                                                                  
├─nbd0p1
│    ntfs   System Reserved
│                 70944B6C944B33BE                                    
├─nbd0p2
│    ntfs         3A4E4BDF4E4B9315                                    
└─nbd0p3
     ntfs         DA20B74820B72A81 

but your given code does not work, the following code is accepted by the system, but not able to resolve the problem (which stays identical):

      hardware:
        drives:
          - image: W10.qc2
            inject_partition: 2
        os_type: windows
activeshadow commented 7 months ago

Okay, did you happen to also try partition 3 as the inject partition?

Narratot commented 7 months ago

I tried from 0-3 with the result of 0:

Error: unable to start experiment hello

Cause: reading minimega script: reading mmcli script: 1 error occurred:
 * [image /phenix/images/lap-schonerd_hello_test1_snapshot] desired partition 0 not found

1-3:

Error: unable to start experiment hello

Cause: reading minimega script: reading mmcli script: 1 error occurred:
 * [image /phenix/images/lap-schonerd_hello_test1_snapshot] cp: cannot create regular file '/tmp/minimega/dstImg337918859/phenix/startup/20-startup.ps1': No such file or directory
: exit status 1
activeshadow commented 7 months ago

Can you try mounting the partitions manually? For example:

qemu-nbd -c /dev/nbd0 /path/to/W10.qc2
mkdir -p /tmp/mnt-test-{1,2,3}
mount /dev/nbd0p1 /tmp/mnt-test-1
mount /dev/nbd0p2 /tmp/mnt-test-2
mount /dev/nbd0p3 /tmp/mnt-test-3

Curious if doing so will generate any (more) useful error messages.

Narratot commented 7 months ago

@activeshadow that information leaded me to the underlying issue.

Although it is not related to the phenix configuration, I would recommend to take the solution into the notes as Windows does configure the Fast Boot by default.

The solution is to disable fast boot.

activeshadow commented 7 months ago

@Narratot thanks for following up with the solution. We use Packer configs to build our Windows VM images automatically, which disables fast boot by default.

The phēnix docs are open source in the sandialabs/sceptre-phenix-docs repository, so feel free to make a PR if you think including this solution in the docs would be useful to others!