Open tiwargau opened 1 year ago
Have you solved the problem? My situation is exactly the same as yours.
Hi @alexwangmac I haven't really solved this problem, just worked around it with setting "stage3_prefetch_bucket_size": 0
. This is not an ideal solution as you lose the efficiency.
Hoping deepspeed team can help with this soon.
Same
Hi @alexwangmac I haven't really solved this problem, just worked around it with setting
"stage3_prefetch_bucket_size": 0
. This is not an ideal solution as you lose the efficiency.Hoping deepspeed team can help with this soon.
I ran into the same problem and your fix worked! Indeed the problem arises if not all model params are used during inference.
Any update on this? Running into the same issue when I have unused parameters for a given forward pass!
Any update on this? Running into the same issue when I have unused parameters for a given forward pass!
In the config json, set "stage3_prefetch_bucket_size": 0
, that should work
In the config json, set "stage3_prefetch_bucket_size": 0, that should work
While this might "work" this still not solves the problem for example with mixtral
, since this kind of MoE does not properly work with deepspeed. Also I tried to use mixtral on a multi GPU setup and instead of getting this error message the process just hangs infinitely, most likely because parameters are fetched but not used and thus not released. Even with prefetch_bucket_size=0
In the config json, set "stage3_prefetch_bucket_size": 0, that should work
While this might "work" this still not solves the problem for example with
mixtral
, since this kind of MoE does not properly work with deepspeed. Also I tried to use mixtral on a multi GPU setup and instead of getting this error message the process just hangs infinitely, most likely because parameters are fetched but not used and thus not released. Even withprefetch_bucket_size=0
I have exactly the same issue, when will Mixtral support be added to deepspeed
?
(I posted a similar comment on #4808) I will investigate this issue, but you can use DeepSpeed-FastGen (DeepSpeed-MII) for text generation. The example is available here. I verified that Mixtral works just by modifying the model name. It is easier to use "non-persistent" mode for testing purpose, but "persistent" mode will give you the best performance. Please refer to DeepSpeed-MII for more details.
Hi everyone,
The PR was already merged into master. Please feel free to try, but I still recommend using DeepSpeed-FastGen for text generation.
Hi, I also found this problem also in my experiments. It seems in generation some parameters are not used. Except the PR, a simple workaround can be passing a dummy input to invoke that unused parameter in inference. While warnings like "Invalidate trace cache @ step 1: expected module 1704, but got module 1703" still appears, but the training and generation seems to be fine.
Bug description Context: Running inference on a multi-modal LLM , at each decoding step parts of the network are used and depends on the input modality at each step. In my second step, deepspeed goes ahead and fetches part of the network that ends up not being used. The code does assume that this can happen and correctly invalidates the trace. However, for the params that were prefetched but never used, at the end of the step, these are detected as in-flight and result in the
RuntimeError(f"still have inflight params").
To Reproduce My setup is a bit involved. I am thinking it is clear from the description what the issue is. However, if the team feels like they can benefit from a simple reproduction, I can work on creating one. Please let me know.
Expected behavior I would have expected that when we notice the order of params isn't the same as before, it would be reasonable to also not demand that all the parameters be used. Right now, we tolerate different ordering but require that all the params previously used (hence prefetched) need to be used at some point.
ds_report output
System info (please complete the following information):
AL2 (Amazon Linux) 5.10.149-133.644.amzn2.x86_64 #1 SMP Tue Oct 18 16:52:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
p3.16xlarge instance from aws, 8 V100 with 16 GB per device
0.10.0
4.29.1
accelerate0.21.0
3.9.15