intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

oneprof -q fails with error "ZE_RESULT_SUCCESS' failed" #40

Open mumar-intel opened 1 year ago

mumar-intel commented 1 year ago

I am using oneprof on one HPC+AI application with large number of kernels (~30). When I run: oneprof -q -o test.txt $APP_EXE It fails with error: oneprof/metric_query_collector.h:307: void MetricQueryCollector::ProcessQuery(const ZeQueryInfo&): Assertion `status == ZE_RESULT_SUCCESS' failed

It generates the output files (result. data, and test.txt) but the test.txt contains just the application total runtime and provides no information about the individual kernels.

I have tested it one tile, and one GPU. The application does not use MPI, it is a Python based code.

jfedorov commented 11 months ago

@mumar-intel sorry for responding in such a delay. recently there were several fixes in oneprof. Can you please try the collection with the recent oneprof and tell if it still reproduced? thank you.

Wanzizhu commented 10 months ago

hi, @jfedorov , i also run into this issue, and i updated to latest commit(9ee0e46cafa145856eaeeefe5f26ec046462300f), below is the error info, is it expected?

 pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQ
uery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: spr [Genuine Intel(R) CPU 0000%@]
Registry and code: 13 MB
Command: python test_linear.py
Uptime: 7.938176 s
Aborted (core dumped)