ramzivic / kgpu

Automatically exported from code.google.com/p/kgpu
0 stars 0 forks source link

Running on the GPU #8

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
You mention that running an OS on the GPU is practically impossible given the 
limitations of current GPUs on your front page. 

I agree that it would be practically impossible to do something like compiling 
linux and attempting to run it in a GPU thread, but I would also argue that the 
amount of work required to come close to this level of functionality may not be 
as large as you think.

GPUs support virtual memory and a RISC ISA for individual threads that supports 
LD/ST and arbitrary control flow.  The existing C/C++ compilers aren't perfect, 
but they are continuously improving (CUDA 5.0 supports separable compilation, 
etc).  Eventually it may be possible to compile the majority of architecture 
independent kernel code to target a GPU.

The main things that you are missing are:
 1) Interrupts for individual threads to trigger trap handlers that allow saving and restoring thread state.
 2) Access to the GPU MMU from the card to control virtual memory mappings.
 3) The ability to dynamically load and unload instruction memory to support dynamic libraries.

I believe that all of these could be worked around if you were creative about 
it.

You could do the following:
  1) Use binary translation to insert periodic yields into applications, that potentially trap into kernel code for thread scheduling.  
  2) The issue that people have with virtual memory on GPUs is that they think that each thread should be similar to process, in that it should be possible to control memory mappings on a fine granularity.  This doesn't make sense at all for a GPU.  Processes should be kernels.  If you start thinking about it like this, then changing memory mappings really just involves suspending a kernel and using the driver to switch contexts.  It would be heavyweight, but you could probably do it thousands of times a second.  
  3) Dynamically loading code could be done by suspending a kernel and loading a new binary.

All of these would be easier with HW support, mainly just a generic interrupt 
mechanism that jumps to a trap handler, and luckily that HW support already 
exists on most GPUs to support debuggers, although the level of documentation 
and support isn't great.  Take a look at the cuda-gdb sources.

There are certainly other concerns like security and fault isolation that would 
need to be addressed eventually, but in the meantime I think that it would be 
very interesting to explore topics related to resource sharing and load 
balancing.

Original issue reported on code.google.com by gregory....@gatech.edu on 22 Jun 2012 at 2:27