We currently allow errors in kernel execution (e.g. if there is a crash
during execution, or results are incorrect) but in case there is an
error during the IR -> PTX compilation process we immediately abort the
search.
This makes experimentations hard, because there are times where those
compilation failures are due to corner cases which are expensive to
handle and shouldn't prevent us from exploring the other interesting
cases. One example for which this happens currently (and will be fixed
separately) is when using dimension sizes which have distinct prime
factors, we can end up with a invalid tiling factor -- for instance we
could try to tile 15 by 9. This causes assertion failures in the
IR -> PTX compiler, which in turn stop the search process.
The "proper" fix for this would be to update the code and use Results
in the appropriate places, but that requires more manpower than
available right now. Instead we use std::panic::catch_unwind as a
best-effort recovery mechanism.
We currently allow errors in kernel execution (e.g. if there is a crash during execution, or results are incorrect) but in case there is an error during the IR -> PTX compilation process we immediately abort the search.
This makes experimentations hard, because there are times where those compilation failures are due to corner cases which are expensive to handle and shouldn't prevent us from exploring the other interesting cases. One example for which this happens currently (and will be fixed separately) is when using dimension sizes which have distinct prime factors, we can end up with a invalid tiling factor -- for instance we could try to tile
15
by9
. This causes assertion failures in the IR -> PTX compiler, which in turn stop the search process.The "proper" fix for this would be to update the code and use
Results
in the appropriate places, but that requires more manpower than available right now. Instead we usestd::panic::catch_unwind
as a best-effort recovery mechanism.