Update benchmarking to check benchmarking results and generate artifacts

This commit adds support, so that we actually make use of what we are benchmarking. We check against a golden time to make sure there are no regression for e2e and the individual submodels as well (unet, clip, vae). I also made the golden time a command line argument so that we can easily update it in the workflow file in either SHARK-TestSuite or iree whenever a patch has a performance boost, and there is no annoying repo dependence there.

There is also a job summary that is posted to the summary in CI, where there is an overview of the whole SDXL benchmarking that happened: https://github.com/nod-ai/SHARK-TestSuite/actions/runs/9455228567

nod-ai / SHARK-TestSuite

Update benchmarking to check benchmarking results and generate artifacts #252