intel / DML

Intel® Data Mover Library (Intel® DML)
https://intel.github.io/DML/
MIT License
81 stars 17 forks source link

HW path reports error #41

Open suyashmahar opened 2 months ago

suyashmahar commented 2 months ago

I'm unable to use the HW path for mem move even after configuring the DSA devices:

$ sudo ./hl_mem_move_example hardware_path
Executing using dml::hardware path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
dml-diag: DML version TODO
dml-diag: Struct size: 3328 B
dml-diag: loading driver: libaccel-config.so.1
Failure occurred.

When manually calling dml::memmove, I get error code 16 that corresponds to internal library error. Is there a way to debug this? Any help would be really appreciated. Thanks!

System Configuration

Processor: Intel(R) Xeon(R) Silver 4416+

I have configured DSA using the python script:

$ sudo python3 accel_conf.py --load=../configs/1n1d1e1w-s-n1.conf
Filter:
Disabling active devices
    dsa0 - done
Loading configuration - done
Additional configuration steps
    Force block on fault: False
Enabling configured devices
    dsa0 - done
        wq0.0 - done
Checking configuration
    node: 0; device: dsa0; group: group0.0
        wqs:     wq0.0
        engines: engine0.0

I'm also running relatively recent kernel version:

$  uname -a
Linux machinename 6.8.0-rc7 #1 SMP PREEMPT_DYNAMIC Thu Mar  7 11:11:46 PST 2024 x86_64 x86_64 x86_64 GNU/Linux

Kernel cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-rc7 root=UUID=4f739d8f-4f15-4fc3-b419-bbb0202131b3 ro splash earlyprintk=ttyS1,115200 console=ttyS1,115200 c
onsole=ttyS0,115200 memmap=8G!16G nokaslr movable_node=2 intel_iommu=on,sm_on iommu=on vt.handoff=7

lspci output for one of the two devices available:

$ sudo lspci -vvv -s 75:01.0
75:01.0 System peripheral: Intel Corporation Device 0b25
        Subsystem: Intel Corporation Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        IOMMU group: 1
        Region 0: Memory at 21bffff50000 (64-bit, prefetchable) [size=64K]
        Region 2: Memory at 21bffff20000 (64-bit, prefetchable) [size=128K]
        Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ RBE+ FLReset+
                DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+ LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
        Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [160 v1] Transaction Processing Hints
                Device specific mode supported
                Steering table in TPH capability structure
        Capabilities: [170 v1] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed+ WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=1 ArbSelect=Fixed TC/VC=02
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
        Capabilities: [220 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [230 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv+, Max PASID Width: 14
                PASIDCtl: Enable+ Exec- Priv+
        Capabilities: [240 v1] Page Request Interface (PRI)
                PRICtl: Enable+ Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00000200, Page Request Allocation: 00000200
        Kernel driver in use: idxd
        Kernel modules: idxd
mzhukova commented 2 months ago

Hi @suyashmahar, In the examples/high-level-api/mem_move_example.cpp, could you please also print out result.status right before "Failure occurred" message?

suyashmahar commented 2 months ago

Hi @mzhukova, I got 16: image

suyashmahar commented 2 months ago

Hi @mzhukova , are there any env flags / build configuration I can use to debug this issue? Thanks for the help!

suyashmahar commented 2 months ago

@mzhukova, I think I found the reason. If DML cannot find libaccel-config.so, it just reports an internal error. I confirmed this using strace.

Any HW initialization failure in this code is reported as a generic failure. If the "if" condition fails.

https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hardware_device.cpp#L42-L68

This is where the library tries to load libaccel-config.so

https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hw_dispatcher/hw_dispatcher.cpp#L45

If I make sure that libaccel-config.so is accessible, the hardware_path example works.

mzhukova commented 2 months ago

Sorry for the delayed response @suyashmahar. I'm glad that you were able to find the root cause of the failure. We will work on improving the status reporting in one of the future releases.