iofcas / nativeclient-sdk

Automatically exported from code.google.com/p/nativeclient-sdk
0 stars 0 forks source link

Improve performance of pi_generator by better lock management #124

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
(This is a suggestion from jonovos at gmail.com to native-client-discuss)

As of SDK SVN r1108, the pi_generator example performs much more slowly than it 
needs to because of unnecessary pixel locking.  Jonovos suggests the following:

Here is a simple refactoring recipe for speeding up the pi_generator
about 15-times faster. The speed-up is striking.

====================================

void* PiGenerator::ComputePi(void* param)
{
 int count = 0;  // The number of points put inside the inscribed quadrant.
 unsigned int seed = 1;
 PiGenerator* pi_generator = static_cast<PiGenerator*>(param);
 srand(seed);
 ///////////////////////////////////////////////////////////////////////////////////////////
 ///////////////////////////////////////////////////////////////////////////////////////////
 ////////////////////////////////////////////////////////////////////CHANGE: ++i ==> i+=16
 ////////////////////////////////////////////////////////////////////
 for (int i = 1; i <= kMaxPointCount && !pi_generator->quit();  i+=16 )
 {
   ///////////////////////////////////////////////////////////////////////////////////////////
   ///////////////////////////////////////////////////////////////////////////////////////////
   //////////////////////////////////////////////////////////////////This takes a slice of
   //////////////////////////////////////////////////////////////////time. We acquire the
   //////////////////////////////////////////////////////////////////pointer to 'pixel_bits'
   //////////////////////////////////////////////////////////////////ONCE here, and use it
   //////////////////////////////////////////////////////////////////16-times in the loop
   //////////////////////////////////////////////////////////////////below.
   ScopedPixelLock scoped_pixel_lock(pi_generator);
   uint32_t* pixel_bits = scoped_pixel_lock.pixels();
   if (pixel_bits == NULL) {
     // Note that if the pixel buffer never gets initialized, this won't ever
     // paint anything.  Which is probably the right thing to do. Also, this
     // clause means that the image will not get the very first few Pi dots,
     // since it's possible that this thread starts before the pixel buffer is
     // initialized.
     continue; ////////////// Note: This spin-back eats i+=16 each time it goes.
   }

   ///////////////////////////////////////////////////////////////////////////////////////////
   //////////////////////////////////////////////////////////////////Wrap the random point
   //////////////////////////////////////////////////////////////////generation inside a
   //////////////////////////////////////////////////////////////////16-iteration loop
   for( int j=0; j<16; j++ )
   {
       double x = static_cast<double>(rand_r(&seed)) / RAND_MAX;
       double y = static_cast<double>(rand_r(&seed)) / RAND_MAX;
       double distance = sqrt(x * x + y * y);
       int px = x * pi_generator->width();
       int py = (1.0 - y) * pi_generator->height();
       uint32_t color = pixel_bits[pi_generator->width() * py + px];

       if (distance < 1.0)
       {
           // Set color to blue.
           ++count;
           pi_generator->pi_ = 4.0 * count / i;
           color += 4 << kBlueShift;
           color &= kBlueMask;
       }
       else
       {
           // Set color to red.
           color += 4 << kRedShift;
           color &= kRedMask;
       }

       pixel_bits[pi_generator->width() * py + px] = color | kOpaqueColorMask;
   }

   ///////////////////////////////////////////////////////////////////////////////////////////
   ///////////////////////////////////////////////////////////////////////////////////////////
   //////////////////////////////////////////////////////////////////At end of this loop,
   //////////////////////////////////////////////////////////////////the 'scoped_pixel_lock'
   //////////////////////////////////////////////////////////////////will delete itself and
   //////////////////////////////////////////////////////////////////un-lock itsef in the
   //////////////////////////////////////////////////////////////////object's destructor
   //////////////////////////////////////////////////////////////////-
   //////////////////////////////////////////////////////////////////When the loop resumes,
   //////////////////////////////////////////////////////////////////the pixel lock will
   //////////////////////////////////////////////////////////////////live again, for another
   //////////////////////////////////////////////////////////////////sixteen random pixels.
 }
 return 0;
}

====================================

(*)--WHY IS THIS FASTER?? ===> //answer// ===> Because the inner-loop
RE-USES the pointer 'pixel_bits', which is obtained from the scoped-
pixel-lock. ===>(bottle neck choke point)

This simple example speedup should motivate the reader to look for
more efficient
interfacing with the NaCl API's.

Original issue reported on code.google.com by mball@google.com on 6 Sep 2011 at 1:42

GoogleCodeExporter commented 9 years ago
pi_generator is not meant to be a performance example, it's an example that 
shows how to use threading and 2D drawing.

Original comment by dsprin...@chromium.org on 8 Sep 2011 at 10:59