Open aliozts opened 6 months ago
Hi, we trained it with indivisible value of n_inner
due to SwiGLU layer. We will experiment with some padding and hacks and evaluate to confirm no drop in performance in due time and update you. Thanks.
Hi @grandiose-pizza were you able to experiment with padding
Hi. Please watch out for new models to be released soon on our Huggingface. We will be fixing these.
Your current environment
I was using the latest docker image(0.4.0) with 4-8L4 GPUs for the mentioned problem. I also tested this with installing from source as well with a custom docker image.
🐛 Describe the bug
Hello, first of all, thank you for the grand work!
I was trying to utilize the recently supported JAIS models. When I try jais-30b-chat-v3 with 8xL4 GPUs, I was getting the error
I wanted to test the jais-13b-chat model for the same purpose to see if I can deploy it to 4xL4 GPUs and I got
Commands that I was utilizing can be generalized along the lines of:
After checking the
config.json
files for each model, I saw that this is then_inner
parameter. I suppose it should be divisible to the number of GPUs I want to parallelize them into. May I ask if this is the intended behaviour or can I just modify then_inner
parameter to my liking for a hacky way around etc,?