@dbarbuzzi pointed out during QA testing that for the sparse quantized example it is not clear the code snippets are meant to be run one by one in the same Python instance. Updating the README to make this more explicit. Also updating the quantization config for w8a8 since "tensor" is the strategy we have been testing with for this example.
@dbarbuzzi pointed out during QA testing that for the sparse quantized example it is not clear the code snippets are meant to be run one by one in the same Python instance. Updating the README to make this more explicit. Also updating the quantization config for w8a8 since "tensor" is the strategy we have been testing with for this example.