Support global functions for multi-gpu conjugate gradient

vkirangoud / cusp-library

Automatically exported from code.google.com/p/cusp-library

Apache License 2.0

0 stars 0 forks source link

Support global functions for multi-gpu conjugate gradient #74

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

This is an enhancement rather than an issue.

It would be nice to have several functions operate across multiple gpus

double gpuSumProd(vec, vec):  this would compute the dot-product of two vectors 
that exist on each gpu and gather the result for all gpus running in parallel

double gpuSumMag(vec):  This would sum the magnitude of a vector that exists on 
each gpu with the same name and gather the result 

double gpuAverage(vec): this would compute the average value of a particular 
vector that exists on each gpu and gather the result

these would help a multi-gpu implementation of a conjugate gradient solver.

Original issue reported on code.google.com by dan.combest on 26 Jul 2011 at 7:35

GoogleCodeExporter commented 9 years ago

Sorry, by gather I mean reduce (sum all of the values together).

Original comment by dan.combest on 26 Jul 2011 at 11:46

GoogleCodeExporter commented 9 years ago

I think this is a great feature but first we have to figure out how to make a 
good distributed 1D array.

Original comment by filipe.c...@gmail.com on 26 Jul 2011 at 11:53

GoogleCodeExporter commented 9 years ago

I'm thinking more from the perspective of using a domain decomposition method 
to solve the system.  Each GPU has there own subdomain to solve and pass 
interface values rather than multi-gpu parallelizing the operations themselves 
(if that is where you are going).  I will start a discussion on the users group.

Original comment by dan.combest on 26 Jul 2011 at 11:58

GoogleCodeExporter commented 9 years ago

I think support for domain decomposition to solve the system would be of great 
benefit as e.g. CFD applications apply this approach to distribute a case 
across processor cores using mpi hence it could be directly applied to 
distribute a case across multiple GPUs to speed things up.

E.g. http://www.openfoam.org/docs/user/running-applications-parallel.php

Original comment by k_burk...@yahoo.com on 17 Jun 2012 at 12:29

GoogleCodeExporter commented 9 years ago

I went ahead and did this for openfoam and cusp...putting it together in a 
library called cufflink. It uses OF domain decomposition for running multiple 
gpus and cusp.

http://code.google.com/p/cufflink-library/

Dan

Original comment by dan.combest on 17 Jun 2012 at 2:14