tugrul512bit / VirtualMultiArray

C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
GNU General Public License v3.0
15 stars 3 forks source link

Best compiler for this? #2

Open Logic-Elliven opened 3 years ago

Logic-Elliven commented 3 years ago

This looks very interesting tugrul512bit. Thx!

I'm no dev and have never compiled any software. VMA has inspired me to look into compiling! I do know that Intel's compiler is NOT the best choice for AMD systems! https://www.agner.org/optimize/blog/read.php?i=49#49

Question is: Which compiler works best with VirtualMultiArray? AOCC? Clang? GCC? https://www.phoronix.com/scan.php?page=article&item=aocc31-gcc11-clang12&num=1

Perhaps AOMP as it supports offloading to multiple GPU acceleration targets? https://www.openmp.org/resources/openmp-compilers-tools/

So which compiler should I be concentrating on learning for compiling VMA?

tugrul512bit commented 3 years ago

I tested only with gcc and microsofts visual studio c++. There is wiki section with compiling page.

Works fastest with gcc. I think its because of better openmp support in gcc for examples and maybe compiler quality.

Wait a minute, are you saying that AOCC has auto gpu offload for standard C++ libraries? R u sure? This project uses gpu computation only for finding data in the array. All other tasks are just buffer movements so only cpu cores are required generally.

If openmp is offloaded to gpu in AOCC then it doesnt help this project because openmp used mostly for examples. But it looks certainly cool to have auto offload to gpu in a compiler. Perhaps nbody example can benefit.

Logic-Elliven commented 2 years ago

Hello tugrul :)

I'm way out of my depth here. Those are simply links I found that looked like they may be of interest to you/us.

Here's another, but more realted to GpuRamDrive:

"...Rewritten I/O request logic to allow fully parallel I/O. This is by default only used when communicating with AWEAlloc driver for physical memory virtual disks, but can be turned on for any virtual disks backed by an image file or another kernel level driver, provided that the underlying driver supports it. Use -o par command line option to turn on this feature. This change should give better physical RAM disk performance. It should also give better performance for instance when using ImDisk to mount offsets of ?\PhysicalDrive objects as virtual disks. AWEAlloc driver rewritten to support multiple requests simultaneously, which means that ImDisk no longer need to queue requests and switch to a worker thread to complete requests to a physical memory RAM disk..." http://www.ltr-data.se/opencode.html/changelog.html

What I'd love is a nice GUI for your software, similar to GpuRamDrive, where the various settings can be tried/tested to get the most out of the GPU RAM and the 'Drive' can be used as a RamDisk or for caching software like PrimoCache, ReadyBoost or eBoostr.

eBoostr works out the gate on GpuRamDrive, while ReadyBoost etc takes the setting up of a virtual disk that slows it down even more.

tugrul512bit commented 2 years ago

Making a virtual driver is above my head, for now. So, I can't help you with any ramdisk for now, I'm sorry.

Logic-Elliven commented 2 years ago

Windows has a driver. See: https://github.com/prsyahmi/GpuRamDrive/issues/40

Was the "-o par command line option to turn on parallel I/O" helpful?

Logic-Elliven commented 2 years ago

Hi Tugru

I just re-read: "Wait a minute, are you saying that AOCC has auto gpu offload for standard C++ libraries?"

I don't think so!? :) I was talking about AOMP: "...Since AOMP is a clang/llvm compiler, it also supports GPU offloading with HIP, CUDA, and OpenCL..." https://github.com/ROCm-Developer-Tools/aomp

More Info under: AMD C/C++
"ROCm is the open source software stack for AMD CPU+GPU products. The ROCm Compiler Collection ships an LLVM enhanced compiler for C/C++/Fortran that supports OpenMP and offloading to multiple GPU acceleration targets (multi-target).."

Microsoft VHD driver: Yes I'm aware, but slows things down badly.

-o par command line option: I'm not sure I got it to work correctly. Did CMD (admin) Imdisk.exe -e -o par But keep getting the 'help' output, rather than a message saying 'done'..? I'm probably got the syntax wrong.