Closed Shamazo closed 1 year ago
Hi @Shamazo,
I've noticed error code 100 in the output of your tests which is DML_STATUS_LIBACCEL_NOT_FOUND
, let's double check on that first:
Additionally, when you're running examples, could you please check the result.status
which leads to Failure reported?
Hi @mzhukova
I cloned the develop branch, so commit 4cf7cab374ef0869d91c1b02d683d334d59f27d3
accel-config was installed via dnf.
[user1@sprnode5 high-level-api]$ ldd hl_batch_example_example
linux-vdso.so.1 (0x00007ffce4d73000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f67f5600000)
libm.so.6 => /lib64/libm.so.6 (0x00007f67f5525000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f67f5853000)
libc.so.6 => /lib64/libc.so.6 (0x00007f67f5200000)
/lib64/ld-linux-x86-64.so.2 (0x00007f67f587b000)
[user1@sprnode5 high-level-api]$ which accel-config
/usr/bin/accel-config
[user1@sprnode5 high-level-api]$ accel-config --version
3.4.6.3
[user1@sprnode5 high-level-api]$ ldd /usr/bin/accel-config
linux-vdso.so.1 (0x00007ffdfcf94000)
libaccel-config.so.1 => /lib64/libaccel-config.so.1 (0x00007f522ea83000)
libjson-c.so.5 => /lib64/libjson-c.so.5 (0x00007f522ea70000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f522ea67000)
libc.so.6 => /lib64/libc.so.6 (0x00007f522e800000)
/lib64/ld-linux-x86-64.so.2 (0x00007f522eabc000)
In the high level examples, I get error code 16 / dml::status_code::error
Printing out with std::cout << "Failure occurred. Error code: " << static_cast<std::underlying_type<dml::status_code>::type>(result.status) << std::endl;
/tmp/tmp.wNEiagzC7n/cmake-build-release/external/DML/examples/high-level-api/hl_mem_move_example_example hardware_path
Executing using dml::hardware path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
Failure occurred. Error code: 16
Thanks, Hamish
Hi @mzhukova,
Is there any other information I can provide? I only have access to this machine for a couple more days.
Thanks, Hamish
Hi @Shamazo, apologies for the delay in response, I see in your output for accel-config list
that you're trying to use dedicated work queue, is this correct?
If so, this is not supported in DML, see Library Limitations section.
Is there any particular reason you need the DWQ supported?
Thank you for pointing that out, I had missed it in the limitations. I may suggest putting that limitation in the configuration part of the installation instructions.
I don't fundamentally need DWQ, but I work on systems that are effectively single-tenant. So I thought that it may be more performant to use DWQ since I don't need to share the DSA resource. I have not yet measured the performance impact of shared vs direct queues.
Closing this issue for now.
I will let you know if I run into a specific reason to support DWQ.
Sure @Shamazo, I'll try to make it more clear in documentation. Thanks!
Hi,
I am unable to run hardware mode examples/tests. I did a fresh clone from master and built using GCC.
To configure the DSA and kernel, I followed the DSA user guide. I believe I have configured the DSA correctly because I can run the dsa_perf_micros scripts e.g.
However, I cannot run any of the tests/examples in DML with hardware mode, e.g.
(Note I do get the same output regardless of whether I use sudo or not, I have chowned the work queues to set the group ownership to my users group.)
Similarly all tests pass with
./tests --path=sw
and I get a very very large stream of unsuccessful output with./tests --path=hw
. A small sample hereDetails: CPU: Intel (R) Xeon (R) CPU Max 9480
Full DSA config here
Is there anything in the setup I am forgetting/missing?
Thanks in advance, Hamish