Closed GoogleCodeExporter closed 9 years ago
Hi, I tested CUDA 2.3 for support for PTX 1.5 which is the PTX code generated
by
OpenCL NVIDIA compiler.. I have seen the PTX 1.5 code and varies a lot.. so I
suspect
that also would be a nice addition to your package.. In case you want I can
send PTX
code of some examples of the OpenCL SDK..
Perhaps I should have created an issue with "Add support for OpenCL PTX code"..
I attach code of transposition samples in OpenCL (1.5) vs NVIDIA CUDA SDK 2.2
(PTX
1.4) and a OpenCL report where I study OpenCL NVIDIA SDK.
note the functions have the same function signatures (parameters passed, shared
memory setup,etc..)
See example:
// Generated by NVIDIA PTX Backend for LLVM
.version 1.5
.target sm_13, texmode_independent
/* Global Launch Offsets */
.const[0] .s32 %global_block_offset[3];
..
/* Temporary variables for v2load/v4load/read */
.local .b8 vector_load_8[4];
..
/* Function Prototypes */
/* extern unsized array args for kernel 'transpose' */
.extern .shared .b8 transpose_param_4[]; /* ptxbe_def_block */
.const[0] .u32 transpose_param_4_offset;
.const[0] .b32 %dummy_const; /* needed to avoid an assert in driver */
/* Function Bodies */
.entry transpose (
.param .b32 transpose_param_0,
.param .b32 transpose_param_1,
.param .u32 transpose_param_2,
.param .u32 transpose_param_3)
.maxntid 16, 16, 1
{
.reg .b32 ptxbe_def_odata;
.reg .b32 ptxbe_def_idata;
vs.
.version 1.4
.target sm_13
// compiled with C:\CUDA\bin/../open64/lib//be.exe
// nvopencc built on 2009-05-02
.reg .u32 %ra<17>;
..
.entry _Z9transposePfS_ii (
.param .u64 __cudaparm__Z9transposePfS_ii_odata,
.param .u64 __cudaparm__Z9transposePfS_ii_idata,
.param .s32 __cudaparm__Z9transposePfS_ii_width,
.param .s32 __cudaparm__Z9transposePfS_ii_height)
{
Original comment by rtf...@gmail.com
on 25 Jun 2009 at 3:47
Attachments:
I took a look at the OpenCL toolkit and the PTX1.5 samples that you provided.
The
code generated by PTX1.5 is definitely different from 1.4. Here are the
changes that
I noticed offhand:
1. .const[0] - I have no idea what they are trying to say with this
2. The compiler actually generates variables in the local address space
3. .maxntid x, y, z is used in entity declarations
4. Predication is actually used by the compiler for control flow other than
branches
I agree that we want to support future versions of PTX including 1.5. However,
there
was no specification for 1.5 that I could find with CUDA 2.3 or OpenCL. We
could go
by the code examples generated by the OpenCL compiler, but we would need an ISA
specification to ensure that we can handle any PTX 1.5 binary.
I suggest waiting until an official specification is released before we start
working
on this.
Original comment by gregory....@gatech.edu
on 25 Jun 2009 at 4:05
Added only the two examples that were different from 2.2. These exposed a few
minor
bugs involving new predicate instructions generated by the compiler.
Original comment by gregory....@gatech.edu
on 3 Aug 2009 at 9:17
I am closing this as fixed for now. Every 2.3 example that I have run through
Ocelot
has passed without any issues.
Original comment by gregory....@gatech.edu
on 2 Sep 2009 at 1:39
Original issue reported on code.google.com by
gregory....@gatech.edu
on 22 Jun 2009 at 8:21