Closed GoogleCodeExporter closed 8 years ago
Hi Janick,
Your suspicion is correct, the conversion is happening on the device and the
sm_11 architecture produces garbage results for that operation. Ideally the
compiler would emulated these operations on G8x devices, but that's not the
case.
If you're willing to invest a little effort you could fix this by emulating the
conversion yourself. For example, the float64_to_float32 function in this
SoftFloat library [1] should do the conversion safely on older devices. If you
put this code inside a functor and used the __CUDA_ARCH__ preprocessor flag to
selectively enable it on sm_10 and sm_11 devices, I think you could use
thrust::transform() to perform the copy and conversion. You might need to
treat the double as an int2 or something to make sure CUDA reads all the bits.
Since this is an nvcc limitation I'm resolving the issue as WontFix.
[1] http://www.jhauser.us/arithmetic/SoftFloat.html
/*----------------------------------------------------------------------------
| Returns the result of converting the double-precision floating-point value
| `a' to the single-precision floating-point format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
*----------------------------------------------------------------------------*/
float32 float64_to_float32( float64 a )
{
flag aSign;
int16 aExp;
bits64 aSig;
bits32 zSig;
aSig = extractFloat64Frac( a );
aExp = extractFloat64Exp( a );
aSign = extractFloat64Sign( a );
if ( aExp == 0x7FF ) {
if ( aSig ) return commonNaNToFloat32( float64ToCommonNaN( a ) );
return packFloat32( aSign, 0xFF, 0 );
}
shift64RightJamming( aSig, 22, &aSig );
zSig = aSig;
if ( aExp || zSig ) {
zSig |= 0x40000000;
aExp -= 0x381;
}
return roundAndPackFloat32( aSign, aExp, zSig );
}
Original comment by wnbell
on 14 Oct 2010 at 12:39
Hi,
thanks for the clarifications. As sm_11 is still a little bit old I suppose
that the emulation will be added in the future to nvcc.. :-/
I did also look at your suggested lib but have two problems with it:
1) I don't see how to convert POD doubles to the float64 and a float32 back to
a POD float. Is the the bitstructure identically so that I can simply cast
these values?
2) When doing a host->device transformation, is the functor executed on the
host only (__host__) or is it necessary to also declare it as a device function
(__device__). Then in the second case I don't think to be able to call a host
float64_to_float32. And if the first case is true I don't think I gain a lot
with my already implemented branch on __CUDA_ARCH__ with an additional host
buffer and explicit conversion...
Best regards
Janick
Original comment by janick.m...@gmail.com
on 14 Oct 2010 at 1:33
Hi Janick,
Yes, I believe the numbers will have the same bit structure, so manipulating
the bits directly (as the emulation library does) should yield the desired
result.
If you copy/paste all the relevant code from the library (hopefully this is
only a handful of functions) into your project you could label the routines as
__host__ __device__ so that they could be called from a __host__ __device__
functor. This would enable you to perform the conversion entirely on the
device and avoid the host<->device copies.
Original comment by wnbell
on 18 Oct 2010 at 10:56
Original issue reported on code.google.com by
janick.m...@gmail.com
on 14 Oct 2010 at 8:14Attachments: