veluca93 / fpnge

Demo of a fast PNG encoder.
Apache License 2.0
88 stars 8 forks source link

filter skip heuristics #17

Closed hohMiyazawa closed 2 years ago

hohMiyazawa commented 2 years ago

Code for the heuristics discussed in https://github.com/veluca93/fpnge/issues/16

google-cla[bot] commented 2 years ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

veluca93 commented 2 years ago

Do you have a more precise speed + density benchmark for this? Thank you!

hohMiyazawa commented 2 years ago

Here is a benchmark for my test set.

heuristics.ods

I also attached the 3 worst performing images. 0045 0054 0076

animetosho commented 2 years ago

I noticed that the spurious TryPredictor call can be avoided for a minor perf boost.

--- a/fpnge.cc
+++ b/fpnge.cc
@@ -1019,6 +1019,7 @@

   uint8_t predictor;
   size_t best_cost = ~0U;
+  bool skip_paeth = true;
   if (line_number < paeth_ff_threshold || *paeth_run < paeth_ff_threshold) {// skip full filter search if the paeth_ff_block starts with only paeth filters
     // individual filters are skipped if we are done with the stat-gathering part of each filter_skip_block, and the filter was not used enough
     if (line_number % filter_skip_block_size < filter_skip_stat_end || filter_counts[1] >= filter_skip_min_count) {
@@ -1047,13 +1048,12 @@
     }
   }
   else {
-    // current code assumes paeth calculations are stored, so we have to actually generate them even when no filter seach is performed
-    TryPredictor<4, /*store_pred=*/true>(bytes_per_line, current_row_buf, top_buf,
-                                       left_buf, topleft_buf, predicted_data,
-                                       table, best_cost, predictor, dist_nbits);
+    skip_paeth = false;
+    predictor = 4;
   }
 #else
   uint8_t predictor = FPNGE_FIXED_PREDICTOR;
+  bool skip_paeth = false;
 #endif

   writer->Write(table.first16_nbits[predictor], table.first16_bits[predictor]);
@@ -1195,14 +1195,13 @@
                   topleft_buf, encode_chunk_cb, adler_chunk_cb, encode_rle_cb);
   } else {
     assert(predictor == 4);
-#ifdef FPNGE_FIXED_PREDICTOR
-    ProcessRow<4>(bytes_per_line, current_row_buf, top_buf, left_buf,
-                  topleft_buf, encode_chunk_cb, adler_chunk_cb, encode_rle_cb);
-#else
-    // re-use predicted data from TryPredictor
-    ProcessRow<0>(bytes_per_line, predicted_data, nullptr, nullptr, nullptr,
-                  encode_chunk_cb, adler_chunk_cb, encode_rle_cb);
-#endif
+    if (skip_paeth)
+      // re-use predicted data from TryPredictor
+      ProcessRow<0>(bytes_per_line, predicted_data, nullptr, nullptr, nullptr,
+                    encode_chunk_cb, adler_chunk_cb, encode_rle_cb);
+    else
+      ProcessRow<4>(bytes_per_line, current_row_buf, top_buf, left_buf,
+                    topleft_buf, encode_chunk_cb, adler_chunk_cb, encode_rle_cb);
   }

   flush_adler();
hohMiyazawa commented 2 years ago

Good gains from that.

Updated benchmark

heuristics2.ods

hohMiyazawa commented 2 years ago

heuristics3.ods

animetosho commented 2 years ago

Out of interest, is your test script + sample images publicly available?

hohMiyazawa commented 2 years ago

@animetosho Not currently. The script is just a tallying script after running fpnge with a couple of hundred repetitions, but I can certainly out it out there.

The sample images can't, as some of them have licensing and privacy issues.

hohMiyazawa commented 2 years ago

the alternative route taken by animetosho turned out to be better. It gives a similar speedup, but after the more accurate cost estimation code was added, it has almost no density losses.