plasma-umass / coz

Coz: Causal Profiling
Other
4.06k stars 159 forks source link

Provide a Rust example for measuring latency #158

Open Shnatsel opened 4 years ago

Shnatsel commented 4 years ago

The naive adaptation of the provided Rust sample to measure latency instead of throughput produces a completely empty page when plotted:

const A: usize = 2_000_000_000;
const B: usize = (A as f64 * 1.2) as usize;

fn main() {
    coz::thread_init();

    let a = std::thread::spawn(move || {
        coz::thread_init();
        coz::scope!("A");
        let mut counter = 0;
        for i in 0..A {
            counter += i;
        }
    });
    let b = std::thread::spawn(move || {
        coz::thread_init();
        coz::scope!("B");
        let mut counter = 0;
        for i in 0..B {
            counter += i;
        }
    });
    a.join().unwrap();
    b.join().unwrap();
}

Please provide an example that shows correct use of coz for latency measurement in Rust.

FWIW I am struggling with this exact issue on a larger project as well: https://github.com/Shnatsel/jpeg-decoder/tree/coz

zicklag commented 3 years ago

It appears that if your progress points or scopes are not triggered enough times, then coz won't necessarily catch them. I'm still just trying out coz and figuring it out, but it seems like coz runs your program and picks lines to slow down ( and speed up ? ) and measure the impact that changing that line has on performance. If, in the course of running your program, it doesn't run into any of your progress/latency points while it is tweaking the speed of different lines throughout your codebase, it won't have any data to go off of.

In the example you've provided, scope A and scope B will only be entered and exited once when each thread is started/exited. That means it coz can't collect any samples of how speeding up/slowing down lines in the program will effect how often scope A and scope B are entered and exited. No matter what coz tweaks, it will have no effect on how often/how long it takes scope A or scope B to finish. This example works:

const A: usize = 2_000_000_000;
const B: usize = (A as f64 * 1.2) as usize;

fn main() {
    coz::thread_init();

    let a = std::thread::spawn(move || {
        coz::thread_init();
        let mut counter = 0;
        for i in 0..A {
            coz::scope!("A");
            counter += i;
        }
    });
    let b = std::thread::spawn(move || {
        coz::thread_init();
        let mut counter = 0;
        for i in 0..B {
            coz::scope!("B");
            counter += i;
        }
    });
    a.join().unwrap();
    b.join().unwrap();
}

image

By sticking the scope inside the scope of the operation we're measuring we can measure how long it takes to do counter += 1, but I can see that in this case you are probably wanting to see how long it takes to do counter += 1 for 0..B times. In this case, if you want coz to be able to profile this use-case, I think you have to do your operation over and over again to give it a chance to experiment with your program.

I tested this with your JPEG decoder and I got it to work by running decode() for at least 100k iterations and I got this report:

image

According to this report, coz didn't find any outstanding functions that appear to speed up your decoding when optimized. The more you loop the more time coz seems to have to run experiments which can give you more results. I think you can also run coz multiple times on the same program and it will just keep appending to the report any new samples that it takes.

Here's the modified code:

use coz;
use jpeg_decoder as jpeg;
use std::env;
use std::fs::File;
use std::io::{self, BufReader, Read, Write};

fn usage() -> ! {
    write!(io::stderr(), "usage: jpeg-coz image.jpg").unwrap();
    std::process::exit(1)
}

fn main() {
    coz::thread_init();
    let mut args = env::args().skip(1);
    let input_path = args.next().unwrap_or_else(|| usage());
    let mut input_file = File::open(input_path).expect("The specified input file could not be opened");
    let mut bytes: Vec<u8> = vec![];
    input_file.read_to_end(&mut bytes);

    // Decode the image 100,000 times to give `coz` some time to analyze the effect of changes
    for _ in 0..100_000 {
        let mut decoder = jpeg::Decoder::new(bytes.as_slice());
        coz::begin!("decode");
        let data = decoder.decode().expect("Decoding failed. If other software can successfully decode the specified JPEG image, then it's likely that there is a bug in jpeg-decoder");
        coz::end!("decode");
    }
}

I'm still kind of having the same issue where I get empty reports for my program, but testing out your example helped me to get a better idea of how this works. Maybe now I'll be able to fix mine. :smile:

Coz seems pretty cool. I can't wait to see if it actually helps me find a place to optimize my program and get some extra performance!

zicklag commented 3 years ago

OK, so I just realized that I'm supposed to restrict the source files that coz experiments with to my own source files so that the results are actually useful to me as a developer!

coz run -s /path/to/my/project/src/% --- target/release/coz-jpeg

So @Shnatsel, when profiling your jpeg-decoder now, I get much more useful stats, showing where in your code there might be potential for optimizations:

image

Profile.coz:

``` startup time=1602699582349123902 runtime time=3059257542 startup time=1602699594164028412 runtime time=30551736020 startup time=1602699734782960817 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:140 speedup=0.95 duration=497024921 selected-samples=3 latency-point name=append_row_immediate arrivals=20795 departures=20795 difference=1 latency-point name=decode arrivals=165 departures=165 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 speedup=0.50 duration=497058052 selected-samples=6 latency-point name=append_row_immediate arrivals=20145 departures=20145 difference=0 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.75 duration=488808526 selected-samples=15 latency-point name=append_row_immediate arrivals=20862 departures=20862 difference=0 latency-point name=decode arrivals=166 departures=166 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:487 speedup=0.45 duration=498707860 selected-samples=3 latency-point name=append_row_immediate arrivals=20743 departures=20742 difference=1 latency-point name=decode arrivals=164 departures=164 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.35 duration=494105200 selected-samples=17 latency-point name=append_row_immediate arrivals=20433 departures=20434 difference=0 latency-point name=decode arrivals=162 departures=162 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=1.00 duration=491127183 selected-samples=9 latency-point name=append_row_immediate arrivals=19546 departures=19547 difference=0 latency-point name=decode arrivals=155 departures=156 difference=17364172535296355315 runtime time=3111372884 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:0 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 count=18 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 count=74 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:215 count=21 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 count=74 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 count=40 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:252 count=8 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 count=23 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:420 count=8 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:935 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:946 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:0 count=7 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 count=49 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:129 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:132 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:140 count=23 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:152 count=4 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:153 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:199 count=7 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 count=4 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:291 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:307 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:309 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:355 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 count=20 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:166 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:487 count=18 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:493 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:0 count=1 startup time=1602699748447961690 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.00 duration=500064844 selected-samples=10 latency-point name=append_row_immediate arrivals=19067 departures=19068 difference=0 latency-point name=decode arrivals=151 departures=151 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 speedup=0.45 duration=499613059 selected-samples=1 latency-point name=append_row_immediate arrivals=18893 departures=18893 difference=1 latency-point name=decode arrivals=150 departures=150 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:252 speedup=0.60 duration=500060881 selected-samples=0 latency-point name=append_row_immediate arrivals=18524 departures=18525 difference=0 latency-point name=decode arrivals=147 departures=147 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 speedup=0.05 duration=500015840 selected-samples=1 latency-point name=append_row_immediate arrivals=17829 departures=17828 difference=1 latency-point name=decode arrivals=142 departures=142 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.15 duration=498562372 selected-samples=10 latency-point name=append_row_immediate arrivals=18416 departures=18415 difference=1 latency-point name=decode arrivals=146 departures=146 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 speedup=0.00 duration=500057528 selected-samples=6 latency-point name=append_row_immediate arrivals=20091 departures=20092 difference=0 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.00 duration=500057118 selected-samples=8 latency-point name=append_row_immediate arrivals=20570 departures=20570 difference=1 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 speedup=0.05 duration=500037153 selected-samples=1 latency-point name=append_row_immediate arrivals=20576 departures=20576 difference=1 latency-point name=decode arrivals=164 departures=164 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:252 speedup=0.10 duration=499887915 selected-samples=2 latency-point name=append_row_immediate arrivals=20629 departures=20630 difference=0 latency-point name=decode arrivals=164 departures=164 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.65 duration=491606985 selected-samples=13 latency-point name=append_row_immediate arrivals=20767 departures=20768 difference=0 latency-point name=decode arrivals=165 departures=165 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.10 duration=498562294 selected-samples=15 latency-point name=append_row_immediate arrivals=16897 departures=16896 difference=1 latency-point name=decode arrivals=134 departures=134 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.65 duration=492909468 selected-samples=11 latency-point name=append_row_immediate arrivals=18118 departures=18119 difference=0 latency-point name=decode arrivals=144 departures=144 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:215 speedup=0.35 duration=498307196 selected-samples=5 latency-point name=append_row_immediate arrivals=19641 departures=19641 difference=1 latency-point name=decode arrivals=155 departures=155 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.00 duration=500056849 selected-samples=9 latency-point name=append_row_immediate arrivals=20526 departures=20526 difference=0 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.65 duration=494212318 selected-samples=9 latency-point name=append_row_immediate arrivals=17298 departures=17298 difference=0 latency-point name=decode arrivals=137 departures=137 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.70 duration=495858805 selected-samples=6 latency-point name=append_row_immediate arrivals=18600 departures=18600 difference=0 latency-point name=decode arrivals=148 departures=148 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 speedup=0.60 duration=499456629 selected-samples=1 latency-point name=append_row_immediate arrivals=20527 departures=20527 difference=1 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 speedup=0.60 duration=495256252 selected-samples=8 latency-point name=append_row_immediate arrivals=20535 departures=20535 difference=0 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.00 duration=500055068 selected-samples=13 latency-point name=append_row_immediate arrivals=20631 departures=20631 difference=0 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.65 duration=490956528 selected-samples=14 latency-point name=append_row_immediate arrivals=20338 departures=20337 difference=1 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 speedup=0.00 duration=500057279 selected-samples=2 latency-point name=append_row_immediate arrivals=20179 departures=20179 difference=1 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:215 speedup=0.00 duration=500058526 selected-samples=2 latency-point name=append_row_immediate arrivals=18994 departures=18995 difference=0 latency-point name=decode arrivals=151 departures=151 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:0 speedup=0.35 duration=500058903 selected-samples=0 latency-point name=append_row_immediate arrivals=19155 departures=19155 difference=0 latency-point name=decode arrivals=152 departures=152 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.20 duration=497827002 selected-samples=9 latency-point name=append_row_immediate arrivals=18746 departures=18745 difference=1 latency-point name=decode arrivals=148 departures=148 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.70 duration=493305124 selected-samples=9 latency-point name=append_row_immediate arrivals=19805 departures=19806 difference=0 latency-point name=decode arrivals=157 departures=157 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 speedup=0.55 duration=498442753 selected-samples=3 latency-point name=append_row_immediate arrivals=20650 departures=20650 difference=1 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:152 speedup=0.00 duration=500056881 selected-samples=0 latency-point name=append_row_immediate arrivals=20277 departures=20277 difference=0 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.30 duration=497358521 selected-samples=9 latency-point name=append_row_immediate arrivals=17046 departures=17046 difference=1 latency-point name=decode arrivals=135 departures=135 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.05 duration=499508912 selected-samples=11 latency-point name=append_row_immediate arrivals=18849 departures=18850 difference=0 latency-point name=decode arrivals=149 departures=149 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.35 duration=498082267 selected-samples=5 latency-point name=append_row_immediate arrivals=18639 departures=18638 difference=1 latency-point name=decode arrivals=147 departures=147 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.65 duration=497937768 selected-samples=3 latency-point name=append_row_immediate arrivals=19542 departures=19543 difference=0 latency-point name=decode arrivals=156 departures=156 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.35 duration=495508246 selected-samples=13 latency-point name=append_row_immediate arrivals=19720 departures=19720 difference=1 latency-point name=decode arrivals=157 departures=157 difference=17364172535296355316 runtime time=500082655 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:0 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 count=121 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 count=393 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:215 count=78 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 count=371 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 count=231 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:252 count=37 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:266 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:285 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 count=119 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:420 count=70 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:421 count=11 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:505 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:826 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:846 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:899 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:935 count=16 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:947 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:948 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:950 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:971 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:977 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:0 count=26 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:53 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:109 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:121 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 count=188 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:129 count=22 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:132 count=10 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:140 count=100 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:145 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:152 count=28 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:153 count=7 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:199 count=25 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:200 count=11 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 count=47 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:0 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:291 count=6 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:307 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:308 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:309 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:310 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:355 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/marker.rs:2 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 count=73 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:64 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:166 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:167 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:173 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:177 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:179 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:183 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:298 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:375 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:387 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:487 count=72 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:488 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:491 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:493 count=10 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:0 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:18 count=6 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:19 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:28 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:20 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:22 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:46 count=5 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.10 duration=498855737 selected-samples=12 latency-point name=append_row_immediate arrivals=20180 departures=20181 difference=0 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.75 duration=492556818 selected-samples=10 latency-point name=append_row_immediate arrivals=19764 departures=19763 difference=1 latency-point name=decode arrivals=157 departures=157 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.50 duration=494557377 selected-samples=11 latency-point name=append_row_immediate arrivals=20340 departures=20340 difference=0 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:935 speedup=0.00 duration=500057580 selected-samples=1 latency-point name=append_row_immediate arrivals=20197 departures=20196 difference=1 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:935 speedup=0.65 duration=499412577 selected-samples=1 latency-point name=append_row_immediate arrivals=19776 departures=19776 difference=0 latency-point name=decode arrivals=157 departures=157 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 speedup=0.70 duration=490258010 selected-samples=14 latency-point name=append_row_immediate arrivals=19713 departures=19713 difference=1 latency-point name=decode arrivals=156 departures=156 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 speedup=0.45 duration=498256985 selected-samples=4 latency-point name=append_row_immediate arrivals=20208 departures=20208 difference=1 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:0 speedup=0.60 duration=500060295 selected-samples=0 latency-point name=append_row_immediate arrivals=19957 departures=19958 difference=0 latency-point name=decode arrivals=158 departures=158 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 speedup=0.20 duration=499043765 selected-samples=4 latency-point name=append_row_immediate arrivals=20095 departures=20096 difference=0 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 speedup=0.00 duration=500056643 selected-samples=4 latency-point name=append_row_immediate arrivals=20057 departures=20058 difference=0 latency-point name=decode arrivals=159 departures=159 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 speedup=0.35 duration=500066155 selected-samples=0 latency-point name=append_row_immediate arrivals=20086 departures=20086 difference=1 latency-point name=decode arrivals=159 departures=159 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.40 duration=496056811 selected-samples=10 latency-point name=append_row_immediate arrivals=20368 departures=20369 difference=0 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.35 duration=496557523 selected-samples=10 latency-point name=append_row_immediate arrivals=20244 departures=20244 difference=0 latency-point name=decode arrivals=161 departures=161 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 speedup=0.15 duration=499905380 selected-samples=1 latency-point name=append_row_immediate arrivals=20493 departures=20493 difference=0 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 speedup=0.00 duration=500069418 selected-samples=2 latency-point name=append_row_immediate arrivals=17953 departures=17953 difference=1 latency-point name=decode arrivals=142 departures=142 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.35 duration=496215261 selected-samples=11 latency-point name=append_row_immediate arrivals=18580 departures=18580 difference=1 latency-point name=decode arrivals=148 departures=148 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:0 speedup=0.45 duration=498561005 selected-samples=3 latency-point name=append_row_immediate arrivals=19110 departures=19110 difference=0 latency-point name=decode arrivals=152 departures=152 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:140 speedup=0.70 duration=498550135 selected-samples=2 latency-point name=append_row_immediate arrivals=19934 departures=19933 difference=1 latency-point name=decode arrivals=159 departures=159 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:152 speedup=0.30 duration=499707642 selected-samples=1 latency-point name=append_row_immediate arrivals=18281 departures=18280 difference=1 latency-point name=decode arrivals=145 departures=145 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.70 duration=493063726 selected-samples=10 latency-point name=append_row_immediate arrivals=18390 departures=18389 difference=1 latency-point name=decode arrivals=146 departures=146 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.40 duration=494855860 selected-samples=13 latency-point name=append_row_immediate arrivals=20016 departures=20016 difference=0 latency-point name=decode arrivals=159 departures=159 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:0 speedup=0.00 duration=500055265 selected-samples=0 latency-point name=append_row_immediate arrivals=20530 departures=20529 difference=1 latency-point name=decode arrivals=163 departures=163 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 speedup=0.00 duration=500055917 selected-samples=3 latency-point name=append_row_immediate arrivals=20523 departures=20522 difference=1 latency-point name=decode arrivals=162 departures=162 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 speedup=0.65 duration=493557656 selected-samples=10 latency-point name=append_row_immediate arrivals=19620 departures=19620 difference=0 latency-point name=decode arrivals=155 departures=155 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 speedup=0.25 duration=499555268 selected-samples=2 latency-point name=append_row_immediate arrivals=20412 departures=20412 difference=0 latency-point name=decode arrivals=162 departures=162 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.65 duration=494207956 selected-samples=9 latency-point name=append_row_immediate arrivals=20487 departures=20487 difference=0 latency-point name=decode arrivals=162 departures=162 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:421 speedup=0.70 duration=500057232 selected-samples=0 latency-point name=append_row_immediate arrivals=19512 departures=19512 difference=0 latency-point name=decode arrivals=154 departures=154 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:129 speedup=0.40 duration=500060028 selected-samples=0 latency-point name=append_row_immediate arrivals=19090 departures=19089 difference=1 latency-point name=decode arrivals=151 departures=151 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 speedup=0.55 duration=498960126 selected-samples=2 latency-point name=append_row_immediate arrivals=18660 departures=18660 difference=0 latency-point name=decode arrivals=149 departures=149 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 speedup=0.20 duration=499458274 selected-samples=3 latency-point name=append_row_immediate arrivals=20246 departures=20246 difference=1 latency-point name=decode arrivals=160 departures=160 difference=17364172535296355316 experiment selected=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 speedup=0.00 duration=500125364 selected-samples=8 latency-point name=append_row_immediate arrivals=15120 departures=15120 difference=0 latency-point name=decode arrivals=120 departures=121 difference=17364172535296355315 runtime time=32361686096 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:0 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:207 count=224 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:209 count=760 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:215 count=153 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:231 count=715 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:250 count=459 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:252 count=67 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:266 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:285 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:309 count=226 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:420 count=147 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:421 count=19 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:483 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:505 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:648 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:826 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:830 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:846 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:899 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:914 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:935 count=38 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:947 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:948 count=6 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:950 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:971 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/decoder.rs:977 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:0 count=65 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:53 count=4 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:109 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:121 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:125 count=375 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:129 count=43 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:132 count=16 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:140 count=217 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:145 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:152 count=51 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:153 count=10 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:157 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:199 count=49 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:200 count=12 samples location=/home/zicklag/git/other/jpeg-decoder/src/huffman.rs:202 count=100 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:0 count=9 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:291 count=8 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:307 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:308 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:309 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:310 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:337 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/idct.rs:355 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/marker.rs:2 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:0 count=163 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:58 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:64 count=11 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:156 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:166 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:167 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:173 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:177 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:179 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:183 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:289 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:298 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:348 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:375 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:387 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:402 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:408 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:422 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:458 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:487 count=154 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:488 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:491 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:493 count=20 samples location=/home/zicklag/git/other/jpeg-decoder/src/parser.rs:514 count=1 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:0 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:18 count=12 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:19 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/upsampler.rs:28 count=2 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:20 count=3 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:22 count=5 samples location=/home/zicklag/git/other/jpeg-decoder/src/worker/threaded.rs:46 count=9 ```
VitalyArtemiev commented 2 years ago

I am also getting empty profile files no matter what I do. My steps:

git clone https://github.com/plasma-umass/coz.git
cd coz/rust/
cargo build --release --package coz --example toy
sudo coz run --- ./target/release/examples/toy

Resulting file contains only two lines:

startup time=1635633363771401558
runtime time=19187025993

I tried replacing toy.rs with version provided by @zicklag above, but results are the same. After each run only more startup and runtime lines get added, no data points.

Am I doing something wrong? Are the examples just out of date? I'm running Linux Mint 20.

lquerel commented 2 years ago

Same here. Even when I run coz with the -s parameter I still end up with a two-lines profile.coz file. Any help will be appreciated.

Edit: turns out that it works after replacing debug=true by debug=1 in the profile.release section of Cargo.toml.

alxiong commented 1 year ago

I'm able to reproduce the toy example (for throughput profiling)

but when I'm trying to do simple latency profiling using coz::scope!, the profiler just runs and get stuck for hours:

$ coz run -s ~/jellyfish/primitives/src/% --- ./target/release/examples/rs_coz
[libcoz.cpp:100] bootstrapping coz
[libcoz.cpp:128] Including MAIN, which is /home/alxiong/jellyfish/target/release/examples/rs_coz
[inspect.cpp:509] Included source file /home/alxiong/jellyfish/primitives/src/reed_solomon_code/mod.rs
....
[inspect.cpp:316] Including lines from executable /home/alxiong/jellyfish/target/release/examples/rs_coz
[profiler.cpp:75] Starting profiler thread
☝️ stuck here for hours

my code is structurally similar to the single-thread example for the jpeg decoder by @zicklag in https://github.com/plasma-umass/coz/issues/158#issuecomment-708507510.

I'm not sure where it got stuck in, sadly coz doesn't provide coz --verbose mode to print its internal progress.

kormang commented 1 year ago

I'm able to reproduce the toy example (for throughput profiling)

but when I'm trying to do simple latency profiling using coz::scope!, the profiler just runs and get stuck for hours:

I'm not sure, but sounds like problem that can be solved usin method from this PR: https://github.com/plasma-umass/coz/pull/191

Can you try to compile and run with --with-alloc-shims?