GPU Inferencing in Qkeras

Hi there! I was interested in implementing the Qkeras example for MNIST CNN model as given in the examples section - Link. This examples involves quantizing the weights and activations into INT4 or 4 bits using the quantized_bits(4,0,1) method for Conv kernels and activations. Is there any way to perform GPU inferencing by converting the model into something like a TRT engine? This method is widely used for packages like NVIDIA-QAT,so I suppose there should be a way for Qkeras as well.

Thanks, Yoga

google / qkeras

GPU Inferencing in Qkeras #97