xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
359 stars 171 forks source link

kvm plugin bug, pid 234651, process description: 'xcatd SSL: rpower to vm1 for root@localhost: kvm instance' with error 'Not a HASH reference at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 823. #6882

Closed dajiji closed 3 years ago

dajiji commented 3 years ago

How to reproduce:

  1. The following VM is created and works properly without PCIE passthrough
    chdef -t node vm1 groups=vm,all ip=192.168.1.11 mgt=kvm vmhost=g1 vmcpus=2 vmmemory=65536 vmnics=GbE-bridge0 vmnicnicmodel=e1000e serialport=0 serialspeed=115200 netboot=xnba
    rmvm vm1 -pf; mkvm vm1; rpower vm1 on
  2. The bug happens when trying to passthrough a PCIE device
    [root@h1 nodes]# chtab node=vm1 vm.othersettings="devpassthrough:pci_0000_21_00_0"
    [root@h1 nodes]# rmvm vm1 -pf; mkvm vm1; rpower vm1 on
    vm1: [h1]: Error: Cannot remove guest vm, no such vm found
    Error: [h1]: kvm plugin bug, pid 234651, process description: 'xcatd SSL: rpower to vm1 for root@localhost: kvm instance' with error 'Not a HASH reference at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 823.
    ' while trying to fulfill request for the following nodes: vm1
  3. The manually created PCIE passthrough VM works properly with the same xCAT server

Software:

Hardware:

besawn commented 3 years ago

@peterwywong Can you please see if this reproduces in your VM test environment?

gurevichmark commented 3 years ago

Maybe parsing problem introduced by #3916

dajiji commented 3 years ago

Reproduced the same issue on my PC and workstation.

[root@pxe ~]# vm.othersettings="devpassthrough:pci_0000_02_00_0,pci_0000_02_00_1,pci_0000_02_00_2"
-bash: vm.othersettings=devpassthrough:pci_0000_02_00_0,pci_0000_02_00_1,pci_0000_02_00_2: command not found
[root@pxe ~]# rmvm vm1 -pf; mkvm vm1; rpower vm1 on
vm1: [pxe]: Error: Cannot remove guest vm, no such vm found
Error: [pxe]: kvm plugin bug, pid 78499, process description: 'xcatd SSL: rpower to vm1 for root@localhost: kvm instance' with error 'Not a HASH reference at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 823.
' while trying to fulfill request for the following nodes: vm1

Configuration:

peterwywong commented 3 years ago

@dajiji, please provide the following information. (1) lsdef vm1 (2) The storage space is created by "mkvm vm1"; however, the VM storage size is not specified. Please check whether the storage space is created. And what is its size? (3) How did you manually create PCIE passthrough VM (3.). How is it different from "chtab, rmvm, mkvm, rpower..." done manually in (2.)?

dajiji commented 3 years ago

@peterwywong

  1. lsdef vm1
    
    [root@pxe ~]# lsdef vm1
    Object name: vm1
    arch=x86_64
    currstate=netboot centos8-x86_64-compute
    groups=vm,all
    ip=192.168.88.226
    mac=42:e1:c0:a8:58:e2
    mgt=kvm
    netboot=xnba
    os=centos8
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    profile=compute
    provmethod=centos8.3-x86_64-netboot-compute-cuda
    serialport=0
    serialspeed=115200
    status=powering-on
    statustime=12-11-2020 00:27:10
    vmcpus=28
    vmhost=X10DRi
    vmmemory=65536
    vmnicnicmodel=e1000e
    vmnics=GbE-bridge0

[root@pxe ~]# ssh vm1 [root@vm1 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 17M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup rootfs 32G 2.7G 29G 9% / rw 32G 4.0K 32G 1% /.sllocal/log tmpfs 6.3G 0 6.3G 0% /run/user/0


2. The created vm1 runs in memory, so no storage size is specified. And it works perfectly without a PCIE passthrough

[root@pxe ~]# chdef -t node vm1 groups=vm,all ip=192.168.88.226 mgt=kvm vmhost=X10DRi vmcpus=28 vmmemory=65536 vmnics=GbE-bridge0 vmnicnicmodel=e1000e serialport=0 serialspeed=115200 netboot=xnba 1 object definitions have been created or modified. [root@pxe ~]# chtab node=vm1 vm.vidpassword=" " [root@pxe ~]# chtab node=vm1 vm.othersettings="" [root@pxe ~]# rmvm vm1 -pf; mkvm vm1; rpower vm1 on vm1: on [root@pxe ~]# ssh vm1 uptime [root@pxe ~]# ssh vm1 uptime 08:33:41 up 0 min, 0 users, load average: 0.49, 0.13, 0.04


3. It's created as follows. no big difference

virt-install \ --connect qemu:///system \ --vcpus 12,maxvcpus=12,sockets=1,cores=6,threads=2 \ --memory 65536 \ --network bridge=GbE-bridge0,mac=42:e1:c0:a8:58:e2,model=e1000e \ --qemu-commandline="-cpu host -machine q35,kernel_irqchip=on" \ --os-variant=rhel8.3 \ --disk none \ --import \ --name vm2 \ --hostdev pci_0000_01_00_0 \ --hostdev pci_0000_61_00_0 \ --hostdev pci_0000_a1_00_0 \ --hostdev pci_0000_c1_00_0 \

dajiji commented 3 years ago
[root@pxe ~]# chtab node=vm1 vm.othersettings="devpassthrough:pci_0000_02_00_0,pci_0000_02_00_1,pci_0000_02_00_2"
[root@pxe ~]# rmvm vm1 -pf; mkvm vm1; rpower vm1 on
Error: [pxe]: kvm plugin bug, pid 79537, process description: 'xcatd SSL: rpower to vm1 for root@localhost: kvm instance' with error 'Not a HASH reference at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 823.
' while trying to fulfill request for the following nodes: vm1
peterwywong commented 3 years ago

@dajiji

Please provide the output of "lspci" on the "host" system whose devices are passed through to vm1. We can see what 0000:21: 00:0 and others correspond to.

In addition, you may find helpful by tracing "sub build_xmldesc" in /opt/xcat/lib/perl/xCAT_plugin/kvm.pm.

After "rmvm node", "mkvm node" might be traced by the following instrumentation. The output goes to /var/log/xcat/cluster.log.

Add the following trace statement at the beginning of build_xmldesc to make sure this subroutine is called for "mkvm node", where "1" is to enable tracing and "d" for debug. And they are followed by a trace string.

xCAT::MsgUtils->trace(1, "d", "TRACE: build_xmldesc");

I also add "xCT::MsgUtils-> trace" for the pci passthrough code. The purpose is to see whether Objects devname, devobj, devxml and devhash are created properly or not.

   xCAT::MsgUtils->trace(1, "d", "TRACE: passthrough - $passthrudevices[0]");
    #prepare the xml hash for pci passthrough
    my @prdevarray;
    foreach my $devname (@passthrudevices) {
        #This is for SR-IOV vfio
        #Change vfio format 0000:01:00.2 to pci_0000_01_00_2
        xCAT::MsgUtils->trace(1, "d", "TRACE: for loop devname - $devname");
        if ( $devname =~ m/(\w:)+(\w)+.(\w)/ ){
            $devname =~ s/[:|.]/_/g;
            xCAT::MsgUtils->trace(1, "d", "TRACE: if regex - $devname");
            if ( $devname !~ /^pci_/ ) {
                $devname ="pci_".$devname
            }
        }

        my $devobj = $hypconn->get_node_device_by_name($devname);
        unless ($devobj) {
            return -1;
        }

        xCAT::MsgUtils->trace(1, "d", "TRACE: devobj - $devobj");

        #get the xml description of the pci device
        my $devxml = $devobj->get_xml_description();
        unless ($devxml) {
            return -1;
        }

        xCAT::MsgUtils->trace(1, "d", "TRACE: devxml $devxml");

        my $devhash = XMLin($devxml);

        xCAT::MsgUtils->trace(1, "d", "TRACE: devhash 1 $devhash");

        xCAT::MsgUtils->trace(1, "d", "TRACE: devhash 2 $devhash->{capability}");

        xCAT::MsgUtils->trace(1, "d", "TRACE: devhash 3 $devhash->{capability}->{type}");

        xCAT::MsgUtils->trace(1, "d", "TRACE: devhash 4 $devhash->{capability}->{iommuGroup}->{address}");

If the devhash object is not created properly, an hash error would occur on $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'} };

Indeed, this trace statement can be used to trace rmvm, chvm, rpower, etc.

dajiji commented 3 years ago

@peterwywong On ROME AMD

[root@node1 ~]# lspci | grep -v AMD
01:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
21:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
24:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
61:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
62:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
63:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
81:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
a1:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
c1:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
e1:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G404 EL/SL PCIe2 4-Port/4-Lane Packet Switch (rev 05)
e2:01.0 PCI bridge: Pericom Semiconductor PI7C9X2G404 EL/SL PCIe2 4-Port/4-Lane Packet Switch (rev 05)
e2:02.0 PCI bridge: Pericom Semiconductor PI7C9X2G404 EL/SL PCIe2 4-Port/4-Lane Packet Switch (rev 05)
e2:03.0 PCI bridge: Pericom Semiconductor PI7C9X2G404 EL/SL PCIe2 4-Port/4-Lane Packet Switch (rev 05)
e4:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
e4:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
e6:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)

On Intel Haswell X10Dri

[root@X10-Dri ~]# lspci | grep -v E7
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05)
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
00:16.1 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #2 (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
00:1c.4 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #5 (rev d5)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
02:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] (rev a1)
02:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
02:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
02:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane]
82:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1)
82:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
82:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
82:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
dajiji commented 3 years ago

@peterwywong Dear Peter, Sorry for my late response.

It seems that the iommuGroup parser cannot process GeForce GPU correctly, in which case multiple function are in the same iommu group.

Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: build_xmldesc
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: passthrough - 0000:83:00.1
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: devxml <device>
                                       <name>pci_0000_83_00_1</name>
                                       <path>/sys/devices/pci0000:80/0000:80:03.0/0000:83:00.1</path>
                                       <parent>pci_0000_80_03_0</parent>
                                       <driver>
                                         <name>snd_hda_intel</name>
                                       </driver>
                                       <capability type='pci'>
                                         <class>0x040300</class>
                                         <domain>0</domain>
                                         <bus>131</bus>
                                         <slot>0</slot>
                                         <function>1</function>
                                         <product id='0x1aef'>GA102 High Definition Audio Controller</product>
                                         <vendor id='0x10de'>NVIDIA Corporation</vendor>
                                         <iommuGroup number='54'>
                                           <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/>
                                           <address domain='0x0000' bus='0x83' slot='0x00' function='0x1'/>
                                         </iommuGroup>
                                         <numa node='1'/>
                                         <pci-express>
                                           <link validity='cap' port='0' speed='8' width='16'/>
                                           <link validity='sta' speed='2.5' width='16'/>
                                         </pci-express>
                                       </capability>
                                     </device>
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: devhash 1 HASH(0x3cbe478)
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: devhash 2 HASH(0x3cbeb20)
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: devhash 3 pci
Mar 21 19:03:20 mechrevo xcat[8557]: TRACE: devhash 4 ARRAY(0x3cc48c8)
Mar 21 19:03:20 mechrevo xcat[8557]: xcatd: kvm plugin bug, pid 8557, process description: 'xcatd SSL: rpower to vm1 for root@localhost: kvm instance' with error 'Not a HASH reference at /opt/xcat/lib/perl/xCAT_plugin/kvm.pm line 836.
                                     ' while trying to fulfill request for the following nodes: vm1

Can you please suggest how to fix this? Thanks!

Following is a successful passthrough trace in the same computer.

Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: build_xmldesc
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: passthrough - 0000:82:00.0
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devxml <device>
                                       <name>pci_0000_82_00_0</name>
                                       <path>/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0</path>
                                       <parent>pci_0000_80_02_0</parent>
                                       <driver>
                                         <name>mlx5_core</name>
                                       </driver>
                                       <capability type='pci'>
                                         <class>0x020700</class>
                                         <domain>0</domain>
                                         <bus>130</bus>
                                         <slot>0</slot>
                                         <function>0</function>
                                         <product id='0x101b'>MT28908 Family [ConnectX-6]</product>
                                         <vendor id='0x15b3'>Mellanox Technologies</vendor>
                                         <iommuGroup number='52'>
                                           <address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
                                         </iommuGroup>
                                         <numa node='1'/>
                                         <pci-express>
                                           <link validity='cap' port='0' speed='16' width='16'/>
                                           <link validity='sta' speed='8' width='16'/>
                                         </pci-express>
                                       </capability>
                                     </device>
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 1 HASH(0x3cbe4f0)
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 2 HASH(0x3cbeb98)
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 3 pci
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 4 HASH(0x3cc0968)
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devxml <device>
                                       <name>pci_0000_82_00_1</name>
                                       <path>/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1</path>
                                       <parent>pci_0000_80_02_0</parent>
                                       <driver>
                                         <name>mlx5_core</name>
                                       </driver>
                                       <capability type='pci'>
                                         <class>0x020700</class>
                                         <domain>0</domain>
                                         <bus>130</bus>
                                         <slot>0</slot>
                                         <function>1</function>
                                         <product id='0x101b'>MT28908 Family [ConnectX-6]</product>
                                         <vendor id='0x15b3'>Mellanox Technologies</vendor>
                                         <iommuGroup number='53'>
                                           <address domain='0x0000' bus='0x82' slot='0x00' function='0x1'/>
                                         </iommuGroup>
                                         <numa node='1'/>
                                         <pci-express>
                                           <link validity='cap' port='0' speed='16' width='16'/>
                                           <link validity='sta' speed='8' width='16'/>
                                         </pci-express>
                                       </capability>
                                     </device>
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 1 HASH(0x3cbe8c8)
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 2 HASH(0x3cbf018)
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 3 pci
Mar 21 19:11:02 mechrevo xcat[8973]: TRACE: devhash 4 HASH(0x3cc6e48)
peterwywong commented 3 years ago

Hi dajiji,

I see similar PCI devices on my hypervisor:

0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

0003:01:00.1 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.2 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.3 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.4 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)

The XML representation of GPU device "0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)" is as follows:

<device>
  <name>pci_0000_03_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:08.0/0000:03:00.0</path>
  <parent>pci_0000_02_08_0</parent>
  <capability type='pci'>
    <domain>0</domain>
    <bus>3</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x102d'>GK210GL [Tesla K80]</product>
    <vendor id='0x10de'>NVIDIA Corporation</vendor>
    <iommuGroup number='0'>
      <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='8' speed='8' width='16'/>
      <link validity='sta' speed='8' width='16'/>
    </pci-express>
  </capability>
</device>

The PCI name has four numbers corresponding to domain, bus, slot and function. Therefore,

pci_0003_03_00_0 has
    <iommuGroup number='0'>
      <address domain='0x0003' bus='0x03' slot='0x00' function='0x0'/>
    </iommuGroup>

In this case, iommuGroup has only one device, i.e., one address and one row.

The complication as you described comes in when iommuGroup has multiple addresses with different functions, such as the PLX devices.

<device>
  <name>pci_0003_01_00_0</name>
  <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0</path>
  <parent>pci_0003_00_00_0</parent>
  <capability type='pci'>
    <domain>3</domain>
    <bus>1</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x8725' />
    <vendor id='0x10b5'>PLX Technology, Inc.</vendor>
    <iommuGroup number='5'>
      <address domain='0x0003' bus='0x01' slot='0x00' function='0x0'/>    <== There are 5 of them here.
      <address domain='0x0003' bus='0x01' slot='0x00' function='0x1'/>
      <address domain='0x0003' bus='0x01' slot='0x00' function='0x2'/>
      <address domain='0x0003' bus='0x01' slot='0x00' function='0x3'/>
      <address domain='0x0003' bus='0x01' slot='0x00' function='0x4'/>
    </iommuGroup>
    <numa node='8'/>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='8'/>
      <link validity='sta' speed='8' width='8'/>
    </pci-express>
  </capability>
</device>

Device pci_0003_01_00_0 is ONE of the devices in the iommuGroup.

WIth multiple addresses in iommuGroup, mkvm node-name would crash in the following Perl statement in /opt/xcat/lib/perl/xCAT_plugin/kvm.pm.

            $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'} };

{'address'} does not like multiple records.

If there are multiple addresses, $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'}->[N] }; works, where N=0 corresponds to the 1st record, 1 the 2nd, and so on.

Note that this code would crash if there is ONLY ONE record.

So my proposed solution is to scan the address records against the PCI value to find the right address.

So far, we only see differences in the function values, so the following code only compares the function value of the PCI name against the function values of the address records.

            $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'} };

is replaced by

            if (ref $devhash->{'capability'}->{'iommuGroup'}->{'address'} ne 'ARRAY')
            {
               # There is only one record of address.

               $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'} };
            }
            else
            {
               # There are multiple records of address.

               # Extract function portion of PCI devname

               $devname =~ /pci_([0-9]*)_([0-9]*)_([0-9]*)_([0-9]*)/;

               $devfunction = $4;

               $numaddr = length (ref $devhash->{'capability'}->{'iommuGroup'}->{'address'});

               for  ($i = 0; $i < $numaddr; $i++)
               {
                  $tmpval = $devhash->{'capability'}->{'iommuGroup'}->{'address'}->[$i]->{'function'};

                  $tmpval =~ /0x([0-9]*)/;

                  $tmpfunction = $1;

                  if ($devfunction eq $tmpfunction)
                  {
                     $tmphash{source}->{address}->[0] = \%{ $devhash->{'capability'}->{'iommuGroup'}->{'address'}->[$i] };
                     last;
                  }
               }
            }

Please let me know whether the above code works on your system.

dajiji commented 3 years ago

@peterwywong Thank you very much, Peter! The code works perfectly for my RTX3080 GPU PT.