What's the issue, what's expected?:
Trying to write a result summary yaml for the resnet101 raw data generated from tutorial. But get the following warning:
RuleBase: get metrics failed - model-benchmarks
Here's the summary_rule.yaml:
version: v0.11
superbench:
rules:
resnet:
statistics:
- mean
- p90
- min
- max
aggregate: False
categories: Models
metrics:
- model-benchmarks/pytorch-resnet101/float16_train_step_time
How to reproduce it?:
sb result summary --data-file results-summary.jsonl --rule-file summary_rule.yaml --output-file-format md --output-dir ${something}
Log message or shapshot?:
[rule_base.py:75][WARNING] RuleBase: get metrics failed - model-benchmarks
OK, it turns out the correct metrics should be resnet_models/pytorch-resnet101/fp16_train_step_time.
Can I get a result summary for each single node and each single GPU if I have multple nodes and GPUs?
What's the issue, what's expected?: Trying to write a result summary yaml for the resnet101 raw data generated from tutorial. But get the following warning: RuleBase: get metrics failed - model-benchmarks Here's the summary_rule.yaml:
How to reproduce it?: sb result summary --data-file results-summary.jsonl --rule-file summary_rule.yaml --output-file-format md --output-dir ${something}
Log message or shapshot?: [rule_base.py:75][WARNING] RuleBase: get metrics failed - model-benchmarks
Additional information: