Closed aflah02 closed 2 weeks ago
@aflah02 I can check on this later but maybe the format is just wrong for these log entries. Decode/tokenizer are on a separate process separate from gpu inference so it has zero tie to gpu. Perhaps the code/log is forcing output gpu_id
even when the log position is a cpu only task.
Thanks @Qubitium
@aflah02 The code is actually in the tp worker process where gpu work is done but the log print is gated by tp_rank==0
so even if you have gpu == 2 and tp == 2, the log would only print the id of the first one. Effectively, for these stat logs, you should ignore using gpu_id
as judge for multi-gpu tp effectiveness.
Thank You! This clears it up
Hi I was recently running Llama3-70B on a 2xH100 server. I noticed in the logs that all the messages only mentioned
[gpu_id=0]
and was wondering whether this means GPU 1 isn't being used at all to serve requests? When I check the gpu memory usage both are filled to the brim and have high utilization which seems to imply both are being used but then I'm not sure why are there no lines in the log for GPU 1.