intel / DML

Intel® Data Mover Library (Intel® DML)
https://intel.github.io/DML/
MIT License
81 stars 17 forks source link

Is DML a wrapper for Intel DSA? #28

Closed xywoo-cs closed 9 months ago

xywoo-cs commented 10 months ago

I've read the document, but I'm still confused about the hardware and software options.

Is DML a wrapper for Intel DSA?
Since DMA also supports asynchronous data movement, does DML support DMA as well?

And the dml::software is implement by starting another thread (or corountine)?

Thanks in advance!

mzhukova commented 10 months ago

Hello, What document are you referring to? Is it DML documentation or DSA specification?

DML is a software layer that enables DSA, that is correct. We've added Library Architecture Overview for the similar library, called QPL (for IAA support), I wonder if this could be helpful to get a better overall understanding.

Just to clarify, hardware_path (Low-Level API) or dml:hardware (High-Level API) is for offloading operation fully to accelerator. There could be portion of work done on the host, but in most of the cases it is just some preparatory work before offloading. Asynchronous execution on hardware_path is basically about submitting a job and periodically checking whether it is completed (with submit and check done on users side), synchronous interface does the same, but on library side (as if under single API we call submit and periodically check completion). software_path / dml::software is for executing the operation on the host only. So for this path, there is no asynchronous API (submit). automatic_path / dml::auto currently is very straightforward and submits job to DSA first, and in the case of error, do re-try (or continue) on the host. This could be potentially extended to include heuristics for choosing whether to execute on host or accelerator, or extended to divide work between those.

Hope this helps!

xywoo-cs commented 10 months ago

Thank you for your quick response!

xywoo-cs commented 10 months ago

@mzhukova I apologize for reopening this issue, but I still have some related follow-up questions that I would like to ask you.

DSA is a target for offloading many functions. However, before DSA, there was also on-chip DMA in I/OAT similar to DSA. I understand that DSA has more features than DMA, but if we only consider memory copy, is DSA faster than DMA? Additionally, will DML support DMA "hardware"?

Thank you once again!

xywoo-cs commented 9 months ago

For those who are still interested, this paper is a good read, A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel® 4th Gen Xeon® Scalable Processors