Open irshadbhat opened 1 year ago
I tried to run the code as it is for training and at the end of each epoch it does inference on test set, I found that it was taking too long for inference and the GPU utilization was getting maxed out on p4dn.24x that has 8 A100s, 40GB.
+1 to the above. it takes longer time inference than running the actual training
Hi,
I trained a flan-t5-xxl model following the steps from your blog on a custom ner dataset. The training went well without any issues. I also put a
print(labels)
line inpostprocess_text
function to check the predicted labels while evaluation and the results were pretty good.I used 4x A10G 24GB hardware for training with ds_flan_t5_z3_offload_bf16.json deepspeed config file.
Now I wanted to run the model for inference and I used the below deepspeed inference code to do the prediction.
I used the same parameters from the config file to initiate the model for inference. But I am getting CUDA OutOfMemory error. I don't understand how the model fit well for training but needs more memory to do the inferencing.
I believe I am doing something wrong. Please suggest any changes I need to do so I can use the trained model for inference.
Thanks