Closed xywoo-cs closed 9 months ago
Hello, What document are you referring to? Is it DML documentation or DSA specification?
DML is a software layer that enables DSA, that is correct. We've added Library Architecture Overview for the similar library, called QPL (for IAA support), I wonder if this could be helpful to get a better overall understanding.
Just to clarify, hardware_path
(Low-Level API) or dml:hardware
(High-Level API) is for offloading operation fully to accelerator. There could be portion of work done on the host, but in most of the cases it is just some preparatory work before offloading. Asynchronous execution on hardware_path
is basically about submitting a job and periodically checking whether it is completed (with submit and check done on users side), synchronous interface does the same, but on library side (as if under single API we call submit and periodically check completion).
software_path
/ dml::software
is for executing the operation on the host only. So for this path, there is no asynchronous API (submit).
automatic_path
/ dml::auto
currently is very straightforward and submits job to DSA first, and in the case of error, do re-try (or continue) on the host. This could be potentially extended to include heuristics for choosing whether to execute on host or accelerator, or extended to divide work between those.
Hope this helps!
Thank you for your quick response!
@mzhukova I apologize for reopening this issue, but I still have some related follow-up questions that I would like to ask you.
DSA is a target for offloading many functions. However, before DSA, there was also on-chip DMA in I/OAT similar to DSA. I understand that DSA has more features than DMA, but if we only consider memory copy, is DSA faster than DMA? Additionally, will DML support DMA "hardware"?
Thank you once again!
For those who are still interested, this paper is a good read, A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel® 4th Gen Xeon® Scalable Processors
I've read the document, but I'm still confused about the hardware and software options.
Is DML a wrapper for Intel DSA?
Since DMA also supports asynchronous data movement, does DML support DMA as well?
And the
dml::software
is implement by starting another thread (or corountine)?Thanks in advance!