nguyenminhduc9988 / gpuocelot

Automatically exported from code.google.com/p/gpuocelot
0 stars 0 forks source link

Add Support for CUDA 2.3 #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Describe the New Feature:
 1. Download the new toolkit and SDK.
 2. Create a new parser if there is a new version of PTX.
 3. Dump the .ptx files from each sdk sample into the test directory.
 4. Create a test suite for the 2.3 sdk examples.  Make sure that it passes.

Which milestone does the feature belong to?
0.4.0

Which branch does the new feature go in?
Trunk

Original issue reported on code.google.com by gregory....@gatech.edu on 22 Jun 2009 at 8:21

GoogleCodeExporter commented 9 years ago
Hi, I tested CUDA 2.3 for support for PTX 1.5 which is the PTX code generated 
by 
OpenCL NVIDIA compiler.. I have seen the PTX 1.5 code and varies a lot.. so I 
suspect 
that also would be a nice addition to your package.. In case you want I can 
send PTX 
code of some examples of the OpenCL SDK..
Perhaps I should have created an issue with "Add support for OpenCL PTX code"..
I attach code of transposition samples in OpenCL (1.5) vs NVIDIA CUDA SDK 2.2 
(PTX 
1.4) and a OpenCL report where I study OpenCL NVIDIA SDK.

note the functions have the same function signatures (parameters passed, shared 
memory setup,etc..)

See example:
// Generated by NVIDIA PTX Backend for LLVM
.version 1.5
.target sm_13, texmode_independent
/* Global Launch Offsets */
.const[0] .s32 %global_block_offset[3];
..
/* Temporary variables for v2load/v4load/read */
.local .b8 vector_load_8[4];
..
/* Function Prototypes */
/* extern unsized array args for kernel 'transpose' */
.extern .shared .b8 transpose_param_4[]; /* ptxbe_def_block */
.const[0] .u32 transpose_param_4_offset;
.const[0] .b32 %dummy_const; /* needed to avoid an assert in driver */
/* Function Bodies */
.entry transpose (
.param .b32 transpose_param_0,
.param .b32 transpose_param_1,
.param .u32 transpose_param_2,
.param .u32 transpose_param_3)
.maxntid 16, 16, 1
{
.reg .b32 ptxbe_def_odata;
.reg .b32 ptxbe_def_idata;

vs.
.version 1.4
.target sm_13
// compiled with C:\CUDA\bin/../open64/lib//be.exe
// nvopencc built on 2009-05-02
.reg .u32 %ra<17>;
..
.entry _Z9transposePfS_ii (
.param .u64 __cudaparm__Z9transposePfS_ii_odata,
.param .u64 __cudaparm__Z9transposePfS_ii_idata,
.param .s32 __cudaparm__Z9transposePfS_ii_width,
.param .s32 __cudaparm__Z9transposePfS_ii_height)
{

Original comment by rtf...@gmail.com on 25 Jun 2009 at 3:47

Attachments:

GoogleCodeExporter commented 9 years ago
I took a look at the OpenCL toolkit and the PTX1.5 samples that you provided.  
The
code generated by PTX1.5 is definitely different from 1.4.  Here are the 
changes that
I noticed offhand:

1. .const[0] - I have no idea what they are trying to say with this
2. The compiler actually generates variables in the local address space
3. .maxntid x, y, z is used in entity declarations
4. Predication is actually used by the compiler for control flow other than 
branches

I agree that we want to support future versions of PTX including 1.5.  However, 
there
was no specification for 1.5 that I could find with CUDA 2.3 or OpenCL.  We 
could go
by the code examples generated by the OpenCL compiler, but we would need an ISA
specification to ensure that we can handle any PTX 1.5 binary.  

I suggest waiting until an official specification is released before we start 
working
on this.

Original comment by gregory....@gatech.edu on 25 Jun 2009 at 4:05

GoogleCodeExporter commented 9 years ago
Added only the two examples that were different from 2.2.  These exposed a few 
minor
bugs involving new predicate instructions generated by the compiler.

Original comment by gregory....@gatech.edu on 3 Aug 2009 at 9:17

GoogleCodeExporter commented 9 years ago
I am closing this as fixed for now.  Every 2.3 example that I have run through 
Ocelot
has passed without any issues.

Original comment by gregory....@gatech.edu on 2 Sep 2009 at 1:39