tinygrad / open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support
Other
772 stars 57 forks source link

Gds support? #2

Open zeronewb opened 2 months ago

zeronewb commented 2 months ago

NVIDIA Open GPU Kernel Modules Version

NONE

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

None

Kernel Release

None

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

None

Describe the bug

Howdy! Thank you so much for this work! Kinda stupid question, could we use same hack for gds support, for weights offloading? Thanks!

To Reproduce

None

Bug Incidence

Once

nvidia-bug-report.log.gz

None

More Info

No response

johnnynunez commented 2 months ago

Yes gds it would be nice, because only direct storage(https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html) it is working with GPU-DALI(https://github.com/NVIDIA/DALI)

geohot commented 2 months ago

I don't know much about this, but the same idea should work. Would merge clean working GDS.

johnnynunez commented 2 months ago

I don't know much about this, but the same idea should work. Would merge clean working GDS.

gpu-dali is for all gpu cards, but nvidia gds(direct storage, the name now is Magnum IO) is only for professional gpus... so it should be compatible because... if gpu-dali is working, magnum IO should too. It is a litle bit confusing because it's similar but..

NVIDIA DALI:

DALI is a library that accelerates data loading and preprocessing in deep learning applications. It is designed to improve input/output and data processing efficiency by shifting these tasks to the GPU, thereby freeing CPU resources for other operations. It enables a variety of preprocessing operations such as image decoding, transformations, and data augmentation directly on the GPU, which can be extremely useful in computer vision and image processing workflows. It facilitates integration with popular deep learning frameworks such as TensorFlow and PyTorch. NVIDIA Magnum IO GPUDirect Storage:

GPUDirect Storage is part of NVIDIA's Magnum IO suite of technologies designed to optimize and accelerate data transfer between storage and GPUs. It enables applications to read and write directly to GPU memory from storage, avoiding bottlenecks associated with data transfer through CPU and system memory. This is crucial for applications that handle large data sets such as simulations, big data analytics, and other high-performance tasks. It reduces latency and increases performance by enabling faster and more direct transfers of large volumes of data to and from GPUs.