kriswiner / Dragonfly

Arduino sketches for use with the Dragonfly (STM32L4)
21 stars 6 forks source link

[example] a test to compare the dragonfly's fpu to the teensy 3.2 non-fpu #4

Closed 8bitbunny closed 7 years ago

8bitbunny commented 7 years ago

i did a test on both the teensy 3.2 and the dragonfly, with good results:

i did it as the following test conditions: both mcu's @ set at 72 MHZ all code runs from internal SRAM no interrupts disabled both mcu's blend an 24bit array of 8bit colors in seperate arrays (red, green, blue) blending is done using floating point math all arrays are 128*128px

dragonfly test results: about 22ms teensy 3.2 test results: about 160ms

over 7x the performance!

code below:

#define width 128
#define height 128
//uncomment line below for dragonfly
//#define FASTRUN __attribute__ ((section (".ramfunc")))

uint8_t red_[width][height];
uint8_t green_[width][height];
uint8_t blue_[width][height];

uint16_t rgb_to_u16(uint8_t x, uint8_t y) {
  return (((red_[x][y] & 0xFF) / 8) << 11) | (((green_[x][y] & 0xFF) / 4) << 5) | ((blue_[x][y]) / 8);
}

// the setup function runs once when you press reset or power the board
void setup() {
  delay(1500);
  Serial.begin(250000);
  // initialize digital pin LED_BUILTIN as an output.
  delay(800);
  pinMode(LED_BUILTIN, OUTPUT);
}

// the loop function runs over and over again forever
void loop() {
  digitalWrite(LED_BUILTIN, HIGH);
  uint32_t end = 0;
  uint32_t start = micros();
  blend_screen(0.2);
  end = micros();
  Serial.println("a blend took: ");

  Serial.println(end - start);

  Serial.println("microseconds");

  digitalWrite(LED_BUILTIN, LOW);
  delay(500);
}

FASTRUN void blend_screen(float AA) {
  if ( AA <= 0 ) {
    AA = 0;
  }
  if ( AA >= 1) {
    AA = 1;
  }
  uint8_t bgbuffer[3];
  for (int16_t y = 0; y < height; y++) {
    for (int16_t x = 0; x < width; x++) {
      bgbuffer[0] = red_[x][y];
      bgbuffer[1] = green_[x][y];
      bgbuffer[2] = blue_[x][y];
      blendColor(red_[x][y],green_[x][y],blue_[x][y],bgbuffer[0],bgbuffer[1],bgbuffer[2], AA, red_[x][y], green_[x][y], blue_[x][y]);
    }
  }
}

FASTRUN void blendColor(uint8_t bg_red, uint8_t bg_green, uint8_t bg_blue, uint8_t fg_red, uint8_t fg_green, uint8_t fg_blue, float alpha, uint8_t &ret_red, uint8_t &ret_green, uint8_t &ret_blue) {
  ret_red = (bg_red * (1 - alpha) + (fg_red * alpha));
  ret_green = (bg_green * (1 - alpha) + (fg_green * alpha));
  ret_blue = (bg_blue * (1 - alpha) + (fg_blue * alpha));
}
kriswiner commented 7 years ago

Even more interesting would be a test of the power consumption differences...

8bitbunny commented 7 years ago

you mean as in "performance/milliampere"? for the floating point example i posted that would be a massive increase!

sadly i don't have measurement tools, and if did have the right tools, i'd have barely place to set it up properly. so i'd leave that up to someone else.

8bitbunny commented 7 years ago

just took the average run current (executing from flash) from the datasheets of both devices: about 300 uA for the teensy 3.2's mcu per mhz about 100 uA for the dragonfly mcu per mhz the dragonfly is basically 3x as effiecient and regarding floating point performance/watt consumed that's theorethycal over 21x the performance. neat!

kriswiner commented 7 years ago

In my measurements Dragonfly is about 5x more efficient, and Butterfly and Ladybug even more so since they work without an HSE crystal.

8bitbunny commented 7 years ago

oh wow, nice!

also, i have gone for worst case scenario it seems, so if we do a small calculation, this means the fpu performance / watt is roughly 35 times that of the teensy 3.2's, right? impressive result! that, plus the dragonfly has higher core clock by default without overclock. anyhow, if you're okay with it, you can close the issue :)