Closed apoorvkh closed 1 year ago
Hi, I tested the code on Nvidia v100 GPUs, the inference time for the model is around 8ms to 10ms for 512 model. Note this is only the inference time, I did not include time used for loading and pre-processing the data. I don't have a better idea how to make it run fast on CPUs, probably, you can try some quantization method.
The 512 pre-trained model takes roughly 20 seconds for inference using CPUs. For comparison, do you know how long this takes on GPUs? Do you have any advice for optimizing inference time on CPUs? I would ideally like to run the model in ~1 second.