Open yishusong opened 2 months ago
hello @yishusong , using Pandas apply method is slow, I suppose you want to use this model with a specific column and have the output in another maybe.
transform that column into a list, use model.batch_predict_entities(your_list, labels)
create a dictionnary from that output and join back with the dataframe
You would probably run OOM so make sure you run this in batches (split your data) and to run torch.cuda.empty_cache()
.
About the increasing GPU utilization, I am not sure how we can increase or even make sure that it's using GPU during inference, I hope someone helps with that.
you can create batches like this
# Sample text data
all_text = ["sample text 1", "sample text 2", …, "sample text n"]
# Define the batch size
batch_size = 10
# Function to create batches
def create_batches(data, batch_size):
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]
# Example usage of the generator function
all_predictions = []
for batch in create_batches(all_text, batch_size):
predictions = model.batch_predict(batch)
all_predictions.extend(predictions)
Thank you very much for the replies! I'll try it out shortly.
Re: @Marwen-Bhj 's comment about GPU... I haven't looked into the source code yet but is it possible to use the model with huggingface? I was thinking something like device_map = 'auto' to use all GPU, or make data type = float16 to make the data smaller. Does the code base offer configurations like this?
If not, maybe a memory optimized instance will perform better?
you can try the automatic mixed precision (AMP) module in PyTorch for inference. For me it helps speeding-up the training, but I have not tried inference
from torch.cuda.amp import autocast
with autocast(dtype = torch.float16):
predictions = model.batch_predict(batch)
@urchade I tried AMP, it did not increase the inference speed. headsup @yishusong suprisingly, running inference on a CPU cluster is faster by at least 3 times than a GPU : CPU cluster : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz GPU instance : Nvidia V100
Thanks! With CPU there is joblib so there will be more speedup.
Ok, that's weird but ok 😅
Did you try to pass model.to('cuda')
instead of model.cuda()
?
@urchade that fixed it ! thank you :)
Thanks a lot! This indeed speed up inference a lot.
However, model.to('cuda')
seems to only utilize 1 GPU. I looked up online, the nn.DataParallel(model)
won't extend to GLiNER batch_predict...
I'm also interested in how to boost performance using multiple GPUs.
Hi, would it also be possible to speed up using AWS Inferentia / Optimum Neuron? (see article)
I don't think inferentia works because it only supports a very limited list of HF models. Also it might not be compatible with CUDA so there might be other dependency issues.
@yishusong @Marwen-Bhj How were you able to achieve inference within seconds with cpu ? It takes close to 18-20 minutes with a medium 2.1 .
from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
model.to('cpu')
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Hi team,
I'm running inference on a g5.24xlarge GPU instance. The data is currently structured in a Pandas dataframe. I use Pandas apply method to apply the predict_entities function. When the df gets fairly large (~1.5M rows), it takes days to run the inference.
I'm wondering if there is a way to increase GPU utilization? I suppose Pandas df is not the most efficient data structure... or maybe there is a parameter I missed that can boost GPU utilization?
Any advice is much appreciated!