(This is a suggestion from jonovos at gmail.com to native-client-discuss)
As of SDK SVN r1108, the pi_generator example performs much more slowly than it
needs to because of unnecessary pixel locking. Jonovos suggests the following:
Here is a simple refactoring recipe for speeding up the pi_generator
about 15-times faster. The speed-up is striking.
====================================
void* PiGenerator::ComputePi(void* param)
{
int count = 0; // The number of points put inside the inscribed quadrant.
unsigned int seed = 1;
PiGenerator* pi_generator = static_cast<PiGenerator*>(param);
srand(seed);
///////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////CHANGE: ++i ==> i+=16
////////////////////////////////////////////////////////////////////
for (int i = 1; i <= kMaxPointCount && !pi_generator->quit(); i+=16 )
{
///////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////This takes a slice of
//////////////////////////////////////////////////////////////////time. We acquire the
//////////////////////////////////////////////////////////////////pointer to 'pixel_bits'
//////////////////////////////////////////////////////////////////ONCE here, and use it
//////////////////////////////////////////////////////////////////16-times in the loop
//////////////////////////////////////////////////////////////////below.
ScopedPixelLock scoped_pixel_lock(pi_generator);
uint32_t* pixel_bits = scoped_pixel_lock.pixels();
if (pixel_bits == NULL) {
// Note that if the pixel buffer never gets initialized, this won't ever
// paint anything. Which is probably the right thing to do. Also, this
// clause means that the image will not get the very first few Pi dots,
// since it's possible that this thread starts before the pixel buffer is
// initialized.
continue; ////////////// Note: This spin-back eats i+=16 each time it goes.
}
///////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////Wrap the random point
//////////////////////////////////////////////////////////////////generation inside a
//////////////////////////////////////////////////////////////////16-iteration loop
for( int j=0; j<16; j++ )
{
double x = static_cast<double>(rand_r(&seed)) / RAND_MAX;
double y = static_cast<double>(rand_r(&seed)) / RAND_MAX;
double distance = sqrt(x * x + y * y);
int px = x * pi_generator->width();
int py = (1.0 - y) * pi_generator->height();
uint32_t color = pixel_bits[pi_generator->width() * py + px];
if (distance < 1.0)
{
// Set color to blue.
++count;
pi_generator->pi_ = 4.0 * count / i;
color += 4 << kBlueShift;
color &= kBlueMask;
}
else
{
// Set color to red.
color += 4 << kRedShift;
color &= kRedMask;
}
pixel_bits[pi_generator->width() * py + px] = color | kOpaqueColorMask;
}
///////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////At end of this loop,
//////////////////////////////////////////////////////////////////the 'scoped_pixel_lock'
//////////////////////////////////////////////////////////////////will delete itself and
//////////////////////////////////////////////////////////////////un-lock itsef in the
//////////////////////////////////////////////////////////////////object's destructor
//////////////////////////////////////////////////////////////////-
//////////////////////////////////////////////////////////////////When the loop resumes,
//////////////////////////////////////////////////////////////////the pixel lock will
//////////////////////////////////////////////////////////////////live again, for another
//////////////////////////////////////////////////////////////////sixteen random pixels.
}
return 0;
}
====================================
(*)--WHY IS THIS FASTER?? ===> //answer// ===> Because the inner-loop
RE-USES the pointer 'pixel_bits', which is obtained from the scoped-
pixel-lock. ===>(bottle neck choke point)
This simple example speedup should motivate the reader to look for
more efficient
interfacing with the NaCl API's.
Original issue reported on code.google.com by mball@google.com on 6 Sep 2011 at 1:42
Original issue reported on code.google.com by
mball@google.com
on 6 Sep 2011 at 1:42