lion03 / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

Host double to device float copy on arch sm_11 #253

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Please post a short code sample which reproduces the problem:

The code

    vector<double> vector_double;
    vector_double.push_back(  -1.4);
    vector_double.push_back(   1.4);
    vector_double.push_back(-223.4);
    vector_double.push_back(1123.4);
    vector_double.push_back(   0.4);

    thrust::copy(vector_double.begin(), vector_double.end(), ostream_iterator<double>(cout, "\t")); cout << endl;

    thrust::device_vector<float> vector_floatD(vector_double.size());

    thrust::copy(vector_double.begin(), vector_double.end(), vector_floatD.begin());

    thrust::copy(vector_floatD.begin(), vector_floatD.end(), ostream_iterator<float>(cout, "\t")); cout << endl;

procduces the expected output

-1.4    1.4     -223.4  1123.4  0.4
-1.4    1.4     -223.4  1123.4  0.4

when compiled with -arch=sm_13. However, when the same code is compiled with 
-arch=sm_11 (device architecture not supporting double precision) a
warning

ptxas /tmp/tmpxft_000061a8_00000000-2_copy_test.ptx, line 402; warning : Double 
is not supported. Demoting to float

is issued and the result is wrong:

-1.4    1.4     -223.4  1123.4  0.4
2.72008e+23     2.72008e+23     -1.07374e+08    -1.58819e-23    -1.58819e-23

. I am wondering if this copy should be possible or is the type cast handled on 
the device side? I'm using heavy templated code with the data
type as a free parameter and at the moment have to branch in my code using a 
local buffer for the type cast every time a type conversion can not be handled 
by the specified architecture 

What version of Thrust are you using? Which version of nvcc?  Which host
compiler?  On what operating system?

Thrust 1.3, nvcc 3.1, gcc 4.3.5, Gentoo Linux

Original issue reported on code.google.com by janick.m...@gmail.com on 14 Oct 2010 at 8:14

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Janick,

Your suspicion is correct, the conversion is happening on the device and the 
sm_11 architecture produces garbage results for that operation.  Ideally the 
compiler would emulated these operations on G8x devices, but that's not the 
case.

If you're willing to invest a little effort you could fix this by emulating the 
conversion yourself.  For example, the float64_to_float32 function in this 
SoftFloat library [1] should do the conversion safely on older devices.  If you 
put this code inside a functor and used the __CUDA_ARCH__ preprocessor flag to 
selectively enable it on sm_10 and sm_11 devices, I think you could use 
thrust::transform() to perform the copy and conversion.  You might need to 
treat the double as an int2 or something to make sure CUDA reads all the bits.

Since this is an nvcc limitation I'm resolving the issue as WontFix.

[1] http://www.jhauser.us/arithmetic/SoftFloat.html

/*----------------------------------------------------------------------------

| Returns the result of converting the double-precision floating-point value

| `a' to the single-precision floating-point format.  The conversion is

| performed according to the IEC/IEEE Standard for Binary Floating-Point

| Arithmetic.

*----------------------------------------------------------------------------*/

float32 float64_to_float32( float64 a )

{

    flag aSign;

    int16 aExp;

    bits64 aSig;

    bits32 zSig;

    aSig = extractFloat64Frac( a );

    aExp = extractFloat64Exp( a );

    aSign = extractFloat64Sign( a );

    if ( aExp == 0x7FF ) {

        if ( aSig ) return commonNaNToFloat32( float64ToCommonNaN( a ) );

        return packFloat32( aSign, 0xFF, 0 );

    }

    shift64RightJamming( aSig, 22, &aSig );

    zSig = aSig;

    if ( aExp || zSig ) {

        zSig |= 0x40000000;

        aExp -= 0x381;

    }

    return roundAndPackFloat32( aSign, aExp, zSig );

}

Original comment by wnbell on 14 Oct 2010 at 12:39

GoogleCodeExporter commented 8 years ago
Hi,

thanks for the clarifications. As sm_11 is still a little bit old I suppose 
that the emulation will be added in the future to nvcc.. :-/

I did also look at your suggested lib but have two problems with it:

1) I don't see how to convert POD doubles to the float64 and a float32 back to 
a POD float. Is the the bitstructure identically so that I can simply cast 
these values?

2) When doing a host->device transformation, is the functor executed on the 
host only (__host__) or is it necessary to also declare it as a device function 
(__device__). Then in the second case I don't think to be able to call a host 
float64_to_float32. And if the first case is true I don't think I gain a lot 
with my already implemented branch on __CUDA_ARCH__ with an additional host 
buffer and explicit conversion...

Best regards
Janick

Original comment by janick.m...@gmail.com on 14 Oct 2010 at 1:33

GoogleCodeExporter commented 8 years ago
Hi Janick,

Yes, I believe the numbers will have the same bit structure, so manipulating 
the bits directly (as the emulation library does) should yield the desired 
result.

If you copy/paste all the relevant code from the library (hopefully this is 
only a handful of functions) into your project you could label the routines as 
__host__ __device__ so that they could be called from a __host__ __device__ 
functor.  This would enable you to perform the conversion entirely on the 
device and avoid the host<->device copies.

Original comment by wnbell on 18 Oct 2010 at 10:56