Open sunyuhan19981208 opened 1 year ago
Same question. On a A100 80G machine, I setup the vicuna13b model running as the openai_api_server. I then wrote a script to send 10 requests in batch to API server. However, it seems the API server processes the requests one by one. I also submit curl request from another terminal, it's blocking until all previous requests were processed.
You can use Fully Sharded Data Parallel (FSDP for short)
@zenetio but that's not Model Parallel. Do you have any hints on that one?
I would like to inquire about the possibility of combining data parallelism and model parallelism in the context of training llm. I found that the model parallel only support 1 batch while data parallel can not distribute one model to many cards. If I have 1000 1080ti cards and I want train a 65B model in a big batch size, what should I do?