Open xunsongh opened 6 months ago
Are you able to run L0 applications successfully? Most probably Unitrace::create is failing due to L0 call failure. It is during the initialization of the tool where it interacts with L0 hence it is not really matter what is your app doing :). Few things I would suggest to try
BTW, any chance to try on different machine to verify the behavior?
Are you able to run L0 applications successfully? Most probably Unitrace::create is failing due to L0 call failure. It is during the initialization of the tool where it interacts with L0 hence it is not really matter what is your app doing :). Few things I would suggest to try
- See if there is any SYCL or L0 app which exercise L0 apis are running fine on the same environment.
- Try to build Unitrace in the environment where you want to run it. In past I have seen people build the tool in an environment and then run it under different environment which caused tool failure.
- Try to findout which L0 API is failing from the assert and collect the error no.
BTW, any chance to try on different machine to verify the behavior?
Thank you for your guidance. And here are my replies on your suggestions:
And I just had one available PVC machine which let me find this issue and unfortunately the machine was broken several days past.
Are you able to run L0 applications successfully? Most probably Unitrace::create is failing due to L0 call failure. It is during the initialization of the tool where it interacts with L0 hence it is not really matter what is your app doing :). Few things I would suggest to try
- See if there is any SYCL or L0 app which exercise L0 apis are running fine on the same environment.
- Try to build Unitrace in the environment where you want to run it. In past I have seen people build the tool in an environment and then run it under different environment which caused tool failure.
- Try to findout which L0 API is failing from the assert and collect the error no.
BTW, any chance to try on different machine to verify the behavior?
Thank you for your guidance. And here are my replies on your suggestions:
- I can use unitrace tool to trace all those c++ executable programs but only failed on such a simple Python case;
- Of course I built, run, test many cases within a clean environment setup by conda;
- Sorry I don't have such knowledges to track the failed L0 API. In gdb's backtrace the top lines shew as '??' without any useful information.
And I just had one available PVC machine which let me find this issue and unfortunately the machine was broken several days past.
Regarding your response to "Item 1" I doubt if this is related to python app. As per the failure point it looks to be at the very beginning. Lets connect internally to see the setup and failure.
@xunsongh Please check the version of libstdc++.so in you conda env. If it is lower than 6.0.30, you need to upgrade it at least 6.0.30
I built unitrace tool on PVC machine with driver agama-ci-devel-hotfix-821.36 by default without MPI support, and then try to run this tool on a simple python script, but it always be aborted by the assertion error in
UniTracer::Create
.Here is my command to run the successfully built unitrace tool:
Also I tried other options in running but all of them failed on such an assertion error:
My test case is simplest as could:
Would you please help check why the unitrace tool crashed on such a simple case who is even not related to SYCL or L0?