microsoft / lepton_jpeg_rust

Port of DropBox Lepton compression to Rust
Apache License 2.0
85 stars 9 forks source link

Wrong CPU time metric #76

Open Melirius opened 1 month ago

Melirius commented 1 month ago

CPU time measurements are definitely wrong. When I limit Lepton to a single core, reported times are typically nearly twice larger then time reported by perf and time, for example

ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/0.lep; sudo perf stat -B -e cache-references,cache-misses,cycles,stalled-cycles-backend,stalled-cycles-frontend,instructions,branch-instructions,branch-misses,ic_fetch_stall.ic_stall_any,l2_cache_misses_from_ic_miss,l2_latency.l2_cycles_waiting_on_fills,faults,migrations taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/0.jpg images/0.lep
2024-05-13T19:57:57.426Z INFO  [lepton_jpeg_util::structs::lepton_format] compressing to Lepton format
2024-05-13T19:57:57.892Z INFO  [lepton_jpeg_util::structs::lepton_format] Number of threads: 8
2024-05-13T19:57:59.434Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1541ms of CPU time in 1541ms of wall time
2024-05-13T19:57:59.454Z INFO  [lepton_jpeg_util::structs::lepton_format] decompressing to verify contents
2024-05-13T19:58:01.184Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1722ms of CPU time in 1729ms of wall time
2024-05-13T19:58:01.198Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324596 bytes (ratio = 28.0%)
2024-05-13T19:58:01.198Z INFO  [lepton_jpeg_util] Total CPU time consumed:7035ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/0.jpg images/0.lep':

       785 590 057      cache-references                                                        (45,30%)
        65 972 448      cache-misses                     #    8,40% of all cache refs           (45,46%)
    17 302 087 799      cycles                                                                  (45,63%)
       762 732 452      stalled-cycles-backend           #    4,41% backend cycles idle         (45,74%)
        37 246 027      stalled-cycles-frontend          #    0,22% frontend cycles idle        (45,84%)
    43 029 268 863      instructions                     #    2,49  insn per cycle            
                                                  #    0,02  stalled cycles per insn     (45,78%)
     4 572 852 634      branch-instructions                                                     (45,57%)
       163 246 621      branch-misses                    #    3,57% of all branches             (45,36%)
     6 397 381 526      ic_fetch_stall.ic_stall_any                                             (45,25%)
        55 026 687      l2_cache_misses_from_ic_miss                                            (45,26%)
     1 563 306 107      l2_latency.l2_cycles_waiting_on_fills                                        (45,20%)
           136 764      faults                                                                
                 1      migrations                                                            

       3,816304958 seconds time elapsed

       3,527629000 seconds user
       0,287969000 seconds sys

ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/0.lep; sudo target/release/lepton_jpeg_util images/0.jpg images/0.lep
2024-05-13T20:00:21.531Z INFO  [lepton_jpeg_util::structs::lepton_format] compressing to Lepton format
2024-05-13T20:00:21.986Z INFO  [lepton_jpeg_util::structs::lepton_format] Number of threads: 8
2024-05-13T20:00:22.234Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1591ms of CPU time in 246ms of wall time
2024-05-13T20:00:22.260Z INFO  [lepton_jpeg_util::structs::lepton_format] decompressing to verify contents
2024-05-13T20:00:22.541Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1819ms of CPU time in 280ms of wall time
2024-05-13T20:00:22.552Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324596 bytes (ratio = 28.0%)
2024-05-13T20:00:22.552Z INFO  [lepton_jpeg_util] Total CPU time consumed:4432ms

ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/0.lep; sudo time taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/0.jpg images/0.lep
2024-05-13T20:02:13.408Z INFO  [lepton_jpeg_util::structs::lepton_format] compressing to Lepton format
2024-05-13T20:02:13.860Z INFO  [lepton_jpeg_util::structs::lepton_format] Number of threads: 8
2024-05-13T20:02:15.394Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1533ms of CPU time in 1533ms of wall time
2024-05-13T20:02:15.414Z INFO  [lepton_jpeg_util::structs::lepton_format] decompressing to verify contents
2024-05-13T20:02:17.121Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1699ms of CPU time in 1706ms of wall time
2024-05-13T20:02:17.135Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324596 bytes (ratio = 28.0%)
2024-05-13T20:02:17.135Z INFO  [lepton_jpeg_util] Total CPU time consumed:6960ms
3.54user 0.23system 0:03.77elapsed 99%CPU (0avgtext+0avgdata 348024maxresident)k
0inputs+33840outputs (0major+136789minor)pagefaults 0swaps

ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/0.lep; sudo time target/release/lepton_jpeg_util images/0.jpg images/0.lep
2024-05-13T20:02:44.969Z INFO  [lepton_jpeg_util::structs::lepton_format] compressing to Lepton format
2024-05-13T20:02:45.417Z INFO  [lepton_jpeg_util::structs::lepton_format] Number of threads: 8
2024-05-13T20:02:45.668Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1595ms of CPU time in 249ms of wall time
2024-05-13T20:02:45.695Z INFO  [lepton_jpeg_util::structs::lepton_format] decompressing to verify contents
2024-05-13T20:02:45.984Z INFO  [lepton_jpeg_util::structs::lepton_format] worker threads 1910ms of CPU time in 289ms of wall time
2024-05-13T20:02:46.001Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324596 bytes (ratio = 28.0%)
2024-05-13T20:02:46.001Z INFO  [lepton_jpeg_util] Total CPU time consumed:4538ms
3.74user 0.31system 0:01.07elapsed 377%CPU (0avgtext+0avgdata 355572maxresident)k
0inputs+33840outputs (0major+181927minor)pagefaults 0swaps
RockPolish commented 1 month ago

I noticed this too, it seems to have been introduced in 6bbb0ef as the previous commit 2389487 does not have it yet. The worker threads ...ms of CPU time for compression and decompression are still accurate as can be seen in your example, only the Total CPU time consumed: ... isn't.