Open mo-tenstorrent opened 1 year ago
Add -d
and -m
for device only and host only runs.
The spreadsheet inside the above zip shows that before and after changes to profile_this device only and host only runs produce duration within acceptable ranges
Bring device dump to
tracy
python module, using the TT_METAL_DEVICE_PROFILER env variable, decide to do not do device profiling.Add a portion on top of the
python -m tracy
module to to run a test to confirm device data.Skewed timers on cores due to tensix reset is one of the main causes of corruption on device profile data. We already detect that case. We should just run a sample test at the beginning of profile_this and see if skewed is detected, if so, we should error out and not move ahead.
source build/python_env/bin/activate ./tt_metal/tools/profiler/profile_this.py -c "pytest tests/python_api_testing/unit_testing/test_resnet50_first_conv.py" python tt_metal/tools/profiler/process_ops_logs.py -i tt_metal/tools/profiler/logs/ops rm -rf tt_metal/tools/profiler/logs/ cat output/ops/profile_log_ops.csv | awk -F, '{print $13}'
Are all the lines needed to get the device duration column
Turn to this:
source build/python_env/bin/activate ./tt_metal/tools/profiler/profile_this.py -D -c "pytest tests/python_api_testing/unit_testing/test_resnet50_first_conv.py"