tigerneil / aparapi

Automatically exported from code.google.com/p/aparapi
Other
1 stars 0 forks source link

Parser produces NaN for floating point numerals smaller then 1 (f.i. 1.0e-10) if the execution mode is GPU #50

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
Here You see a small example for the problem:
Java-code:
213      gauss[ zz ] = 1.0e-10 * rint( 1.0e10 *  v0 * q );

The resulting bytecode from the Eclipse Class File Editor:
    918  aload_0 [this]
    919  getfield de.nsa_gmbh.hs.sixin.SxKernel.gauss : double[] [40]
    922  iload 39 [zz]
    924  ldc2_w <Double 1.0E-10> [142]
    927  aload_0 [this]
    928  ldc2_w <Double 1.0E10> [68]
    931  dload 40 [v0]
    933  dmul
    934  dload 44 [q]
    936  dmul
    937  invokevirtual de.nsa_gmbh.hs.sixin.SxKernel.rint(double) : double [64]
    940  dmul
    941  dastore

The resulting bytecode from my favorite Bytecode-Plugin for Eclipse:
    LINENUMBER 213 L69
    ALOAD 0
    GETFIELD de/nsa_gmbh/hs/sixin/SxKernel.gauss : [D
    ILOAD 39
    LDC 1.0E-10
    ALOAD 0
    LDC 1.0E10
    DLOAD 40
    DMUL
    DLOAD 44
    DMUL
    INVOKEVIRTUAL de/nsa_gmbh/hs/sixin/SxKernel.rint(D)D
    DMUL
    DASTORE

The resulting openCL code for the GPU from Aparapi:
      this->gauss[zz]  = NAN * rint(((1.0E10 * v0) * q));

Another example (without the bytecode):
Java:
      tryAgain = ( min( q, 1D - q ) < 1.0e-8 );

The resulting openCL code for the GPU from Aparapi:
      tryAgain = (fmin(q, (1.0 - q))<NAN)?1:0)

or even much more simple:
Java:
         double th =  1.0e-8;
The resulting openCL code for the GPU from Aparapi:
      double th = NAN;

What is the expected output? What do you see instead?
expected: 1.0E-10 instead: NAN

What version of the product are you using? On what operating system?
I works on a Dell Latitude E6520 under MS Windows 7 SP1 with
the Eclipse SDK ( Version: 3.7.2, Build id: M20120208-0800 )

The graphic card is NVIDIA Quattro NVS 4200M where the most uptodate driver 
version and CUDA 4.2
installed.

Please provide any additional information below.

Original issue reported on code.google.com by helmut.s...@gmx.de on 16 May 2012 at 10:00

GoogleCodeExporter commented 8 years ago
Helmut, 

Thanks for submitting this.  Clearly we need to look and float and double a 
little closer.  This is probably related to issue #27, however that was focused 
on dealing with 'real NaN' here we (in Aparapi) or the card are mangling the 
float/double representation.  

Let me write some test cases for this and take a look. 

Gary

Original comment by frost.g...@gmail.com on 17 May 2012 at 2:32

GoogleCodeExporter commented 8 years ago
Helmut

I found the culprit and submitted a patch for this #r467

It was as Steve pointed out in his diagnosis of #27 something weird in 
ByteBuffer.d8() which converts double constants from the classfile into the 
disassembler.

Actually the fix was in ByteBuffer.u8() (we pull a long in and convert it's 
bits to a double)

   double d8(int _offset) {
      return (Double.longBitsToDouble(u8(_offset)));
   }

The previous version was
   long u8(int _offset) {
       return (((long) u4(_offset)) << 32 | u4(_offset + 4));
   }

Sadly for values when the lower 32 bits had the most significant bit set this 
was sign extending and causing the top 32 bits to all be 1's.

Now we have 
   long u8(int _offset) {
      return ((u4(_offset)&0xffffffffL) << 32) |(u4(_offset + 4)&0xffffffffL);
   }

This stops the sign extension from taking place. 

This was my test case. 

public class NaNTest{
   public static void main(String[] args) {
      Kernel k = new Kernel(){
         @Override public void run() {
            double gauss = (1.0e-10);        
         }
      };
      k.execute(1024);
   }
}

Original comment by frost.g...@gmail.com on 17 May 2012 at 4:29