lvandeve / lodepng

PNG encoder and decoder in C and C++.
zlib License
2.04k stars 420 forks source link

Settings to Optimize for Speed #92

Closed Kris523 closed 4 years ago

Kris523 commented 5 years ago

Thanks for all the work! I see that you have an example that iterates over settings to optimize for compression. I have adjusted that demo to try and optimize for speed of encoding, but I cannot seam to achieve the speed that I am aiming for.

Code:

// g++ lodepng.cpp test.cpp -ansi -pedantic -Wall -Wextra -O3 -std=c++11

#include "lodepng.h"
#include <chrono>
#include <ctime>
#include <iostream>

int main(int argc, char *argv[]) {
  std::vector<unsigned char> image;
  unsigned w, h;
  std::vector<unsigned char> buffer;
  unsigned error;

  //check if user gave a filename
  if(argc < 3) {
    std::cout << "please provide in and out filename" << std::endl;
    return 0;
  }

  lodepng::load_file(buffer, argv[1]);
  error = lodepng::decode(image, w, h, buffer);

  if(error) {
    std::cout << "decoding error " << error << ": " << lodepng_error_text(error) << std::endl;
    return 0;
  }

  size_t origsize = buffer.size();
  std::cout << "Original size: " << origsize << " (" << (origsize / 1024) << "K)" << std::endl;

  buffer.clear();
  auto start = std::chrono::system_clock::now();

  //Now encode as hard as possible with several filter types and window sizes

  lodepng::State state;
  state.encoder.filter_palette_zero = 4; //We try several filter types, including zero, allow trying them all on palette images too.
  state.encoder.add_id = false; //Don't add LodePNG version chunk to save more bytes
  state.encoder.text_compression = 0; //Not needed because we don't add text chunks, but this demonstrates another optimization setting
  state.encoder.zlibsettings.nicematch = 0; //Set this to the max possible, otherwise it can hurt compression
  state.encoder.zlibsettings.lazymatching = 0; //Definitely use lazy matching for better compression
  state.encoder.zlibsettings.windowsize = 1; //Use maximum possible window size for best compression

  float besttime = 9999999999999999;
  bool inited = false;

  int beststrategy = 0;
  LodePNGFilterStrategy strategies[4] = { LFS_ZERO, LFS_MINSUM, LFS_ENTROPY, LFS_BRUTE_FORCE };
  std::string strategynames[4] = { "LFS_ZERO", "LFS_MINSUM", "LFS_ENTROPY", "LFS_BRUTE_FORCE" };

  // min match 3 allows all deflate lengths. min match 6 is similar to "Z_FILTERED" of zlib.
  int minmatches[5] = { 0, 1, 2, 3,6 };
  int bestminmatch = 0;

  int autoconverts[2] = { 0, 1 };
  std::string autoconvertnames[2] = { "0", "1" };
  int bestautoconvert = 0;

  int bestblocktype = 0;

  // Try out all combinations of everything

    // Try out all combinations of everything
    for(int i = 0; i < 4; i++)   //filter strategy
    for(int j = 0; j < 2; j++)   //min match
    for(int k = 0; k < 2; k++)   //block type (for small images only)
    for(int l = 0; l < 2; l++) { //color convert strategy
    std::vector<unsigned char> temp;
    state.encoder.filter_strategy = strategies[i];
    state.encoder.zlibsettings.minmatch = minmatches[j];
    state.encoder.zlibsettings.btype = k == 0 ? 2 : 1;
    state.encoder.auto_convert = autoconverts[l];

    auto start = std::chrono::system_clock::now();
    error = lodepng::encode(temp, image, w, h, state);
    auto end = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end-start;
    auto currentTime=elapsed_seconds.count();
    std::cout << currentTime << std::endl;
    if(error)
    {
      std::cout << "encoding error " << error << ": " << lodepng_error_text(error) << std::endl;
      return 0;
    }

    if(!inited || currentTime < besttime)
    {
      besttime = currentTime;
      beststrategy = i;
      bestminmatch = state.encoder.zlibsettings.minmatch;
      bestautoconvert = l;
      bestblocktype = state.encoder.zlibsettings.btype;
      temp.swap(buffer);
      inited = true;
    }
  }
  std::cout << "Chosen filter strategy: " << strategynames[beststrategy] << std::endl;
  std::cout << "Chosen min match: " << bestminmatch << std::endl;
  std::cout << "Chosen block type: " << bestblocktype << std::endl;
  std::cout << "Chosen auto convert: " << autoconvertnames[bestautoconvert] << std::endl;
  std::cout <<" best time: " << besttime << std::endl;

  std::cout << "rerunning with best settings" << std::endl;
  start = std::chrono::system_clock::now();
  std::vector<unsigned char> temp;
  state.encoder.filter_strategy = strategies[beststrategy];
  state.encoder.zlibsettings.minmatch = minmatches[bestminmatch];
  state.encoder.zlibsettings.btype = bestblocktype;
  state.encoder.auto_convert = autoconverts[bestautoconvert];
  error = lodepng::encode(temp, image, w, h, state);
  if(error)
  {
      std::cout << "encoding error " << error << ": " << lodepng_error_text(error) << std::endl;
      return 0;
  }
  auto end = std::chrono::system_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  auto currentTime=elapsed_seconds.count();

  std::cout << "retry best time: " << currentTime << std::endl;
  lodepng::save_file(temp, argv[2]);
  std::cout << "New size: " << temp.size() << " (" << (buffer.size() / 1024) << "K)" << std::endl;

}

I am aiming to encode RGB images at 1280x1024px at around 30 FPS or faster. Running with the optimal settings I could achieve with the code above, I could only achieve something along the lines of 8 FPS.

Any chance I can get some tips to further optimize for speed? Size of the resulting PNG isn't an issue. I have attached the testing image I have used to benchmark against.

test

lvandeve commented 4 years ago

With default settings, currently on my computer LodePNG encodes around 8 megapixels/second.

1280x1024 is around a megapixel (slightly bigger) so for 30 FPS a bit more than 30 megapixels/seconds is enough.

I managed to get 36 megapixels/second by disabling compression, but then of course you really don't get compression:

state.encoder.zlibsettings.btype = 0

Then I also measured 20 megapixels/second by keeping state.encoder.zlibsettings.btype = 2 (the default value), but setting state.encoder.zlibsettings.use_lz77 = 0. This severely harms the compression, but not as much as "state.encoder.zlibsettings.btype = 0"

I also found that keeping "state.encoder.auto_convert = 1" enabled can make it faster than without it (this depends on the situation), which is probably because the auto_convert tries to convert to a smaller color model (such as no alpha channel when none is present).

When disabling (or limiting) compression, using auto_convert is important anyway to at least get some other compression in the form of removing unneeded color channels or bitdepth.

If you don't need an alpha channel, you can also consider using state.info_raw.colortype = LCT_RGB, instead of the default LCT_RGBA, and give your input images as RGB rather than RGBA (3 bytes per pixel instead of 4). This also required passing LCT_RGB to the lodepng::decode call at the beginning so it stores this without alpha channel in the buffer.

Does any of these work?

Kris523 commented 4 years ago

I got the fastest speed increase by setting: state.encoder.zlibsettings.btype = 0

Unfortunately on my hardware, I was not able to achieve the 36 megapixels/second that you achieved. I will be moving to storing my images in an raw form, which will result in much lower generation time.

Thanks for all the help!

zvezdochiot commented 4 years ago

@Kris523 say> I got the fastest speed increase by setting: state.encoder.zlibsettings.btype = 0

In vain did you close an interesting question. Such questions do not need to be closed.

In essence the question: https://github.com/Special-graphic-formats/imagezero

mclarsen commented 3 years ago

Just as a data point, I found the

state.encoder.zlibsettings.btype = 2
state.encoder.zlibsettings.use_lz77 = 0

to be a good setting. With the default (no settings), encoding and saving a 1024^2 image (Intel Xeon E5-2695 v4) took 280ms. Using no compression, it took 82ms and resulted in a 3.1M file. Using the settings above, it took 84ms with file size of 511K, which is a good compromise.