Open qianwangn opened 6 days ago
when I use 34B llm model, single-gpu will report OOM. so I set device_map='auto', but It seems cant use torchrun, It takes too much time to inference. how to solve this problem?
Hi, which VLM are you using?
when I use 34B llm model, single-gpu will report OOM. so I set device_map='auto', but It seems cant use torchrun, It takes too much time to inference. how to solve this problem?