Open Shnatsel opened 4 years ago
It appears that if your progress points or scopes are not triggered enough times, then coz won't necessarily catch them. I'm still just trying out coz and figuring it out, but it seems like coz runs your program and picks lines to slow down ( and speed up ? ) and measure the impact that changing that line has on performance. If, in the course of running your program, it doesn't run into any of your progress/latency points while it is tweaking the speed of different lines throughout your codebase, it won't have any data to go off of.
In the example you've provided, scope A and scope B will only be entered and exited once when each thread is started/exited. That means it coz can't collect any samples of how speeding up/slowing down lines in the program will effect how often scope A and scope B are entered and exited. No matter what coz tweaks, it will have no effect on how often/how long it takes scope A or scope B to finish. This example works:
const A: usize = 2_000_000_000;
const B: usize = (A as f64 * 1.2) as usize;
fn main() {
coz::thread_init();
let a = std::thread::spawn(move || {
coz::thread_init();
let mut counter = 0;
for i in 0..A {
coz::scope!("A");
counter += i;
}
});
let b = std::thread::spawn(move || {
coz::thread_init();
let mut counter = 0;
for i in 0..B {
coz::scope!("B");
counter += i;
}
});
a.join().unwrap();
b.join().unwrap();
}
By sticking the scope inside the scope of the operation we're measuring we can measure how long it takes to do counter += 1
, but I can see that in this case you are probably wanting to see how long it takes to do counter += 1
for 0..B
times. In this case, if you want coz to be able to profile this use-case, I think you have to do your operation over and over again to give it a chance to experiment with your program.
I tested this with your JPEG decoder and I got it to work by running decode()
for at least 100k iterations and I got this report:
According to this report, coz
didn't find any outstanding functions that appear to speed up your decoding when optimized. The more you loop the more time coz
seems to have to run experiments which can give you more results. I think you can also run coz multiple times on the same program and it will just keep appending to the report any new samples that it takes.
Here's the modified code:
use coz;
use jpeg_decoder as jpeg;
use std::env;
use std::fs::File;
use std::io::{self, BufReader, Read, Write};
fn usage() -> ! {
write!(io::stderr(), "usage: jpeg-coz image.jpg").unwrap();
std::process::exit(1)
}
fn main() {
coz::thread_init();
let mut args = env::args().skip(1);
let input_path = args.next().unwrap_or_else(|| usage());
let mut input_file = File::open(input_path).expect("The specified input file could not be opened");
let mut bytes: Vec<u8> = vec![];
input_file.read_to_end(&mut bytes);
// Decode the image 100,000 times to give `coz` some time to analyze the effect of changes
for _ in 0..100_000 {
let mut decoder = jpeg::Decoder::new(bytes.as_slice());
coz::begin!("decode");
let data = decoder.decode().expect("Decoding failed. If other software can successfully decode the specified JPEG image, then it's likely that there is a bug in jpeg-decoder");
coz::end!("decode");
}
}
I'm still kind of having the same issue where I get empty reports for my program, but testing out your example helped me to get a better idea of how this works. Maybe now I'll be able to fix mine. :smile:
Coz seems pretty cool. I can't wait to see if it actually helps me find a place to optimize my program and get some extra performance!
OK, so I just realized that I'm supposed to restrict the source files that coz experiments with to my own source files so that the results are actually useful to me as a developer!
coz run -s /path/to/my/project/src/% --- target/release/coz-jpeg
So @Shnatsel, when profiling your jpeg-decoder now, I get much more useful stats, showing where in your code there might be potential for optimizations:
Profile.coz:
I am also getting empty profile files no matter what I do. My steps:
git clone https://github.com/plasma-umass/coz.git
cd coz/rust/
cargo build --release --package coz --example toy
sudo coz run --- ./target/release/examples/toy
Resulting file contains only two lines:
startup time=1635633363771401558
runtime time=19187025993
I tried replacing toy.rs with version provided by @zicklag above, but results are the same. After each run only more startup and runtime lines get added, no data points.
Am I doing something wrong? Are the examples just out of date? I'm running Linux Mint 20.
Same here. Even when I run coz with the -s parameter I still end up with a two-lines profile.coz file. Any help will be appreciated.
Edit: turns out that it works after replacing debug=true by debug=1 in the profile.release section of Cargo.toml.
I'm able to reproduce the toy example (for throughput profiling)
but when I'm trying to do simple latency profiling using coz::scope!
, the profiler just runs and get stuck for hours:
$ coz run -s ~/jellyfish/primitives/src/% --- ./target/release/examples/rs_coz
[libcoz.cpp:100] bootstrapping coz
[libcoz.cpp:128] Including MAIN, which is /home/alxiong/jellyfish/target/release/examples/rs_coz
[inspect.cpp:509] Included source file /home/alxiong/jellyfish/primitives/src/reed_solomon_code/mod.rs
....
[inspect.cpp:316] Including lines from executable /home/alxiong/jellyfish/target/release/examples/rs_coz
[profiler.cpp:75] Starting profiler thread
☝️ stuck here for hours
my code is structurally similar to the single-thread example for the jpeg decoder by @zicklag in https://github.com/plasma-umass/coz/issues/158#issuecomment-708507510.
I'm not sure where it got stuck in, sadly coz
doesn't provide coz --verbose
mode to print its internal progress.
I'm able to reproduce the toy example (for throughput profiling)
but when I'm trying to do simple latency profiling using
coz::scope!
, the profiler just runs and get stuck for hours:
I'm not sure, but sounds like problem that can be solved usin method from this PR: https://github.com/plasma-umass/coz/pull/191
Can you try to compile and run with --with-alloc-shims
?
The naive adaptation of the provided Rust sample to measure latency instead of throughput produces a completely empty page when plotted:
Please provide an example that shows correct use of coz for latency measurement in Rust.
FWIW I am struggling with this exact issue on a larger project as well: https://github.com/Shnatsel/jpeg-decoder/tree/coz