Open qspang opened 8 months ago
Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.
Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.
It is now running normally. Thanks to the author for his timely reply! ! !
btw,do you have code for testing accuracy like the medusa project?https://github.com/FasterDecoding/Medusa/blob/v1.0-prerelease/medusa/eval/heads_accuracy.py
I did not implement such a function, but it may not be too hard to compute accuracy by dividing the number of per-step accepted tokens by the number of per-step speculation tokens.
It can run normally when USE_LADE is set to 1.When I just used multiple GPUs to test without using LADE, it turned out that it was out of memory. Is there something wrong somewhere?The following is my execution command: The error is reported as follows:
When USE_LADE is set to 0, please set --use-pp=1 to use huggingface's pipeline parallelism, or you can use the deepspeed for tensor parallelism as the script I provided https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L33
Fixed it!Thank you for your patient reply! If I use a smaller model such as llama2-7b-chat and set USE_LADE to 0, what will be the impact if --use-pp is also set to 0?
I guess it will be put on a single gpu. Check this https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L214 and https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L250 for model placement configuration.
ok,got it!
When I tested llama2-70b on an A800 graphics card, I encountered the problem of insufficient video memory. I want to ask how I should write the command if I want to test on two A800 graphics cards?I tried this command again: But an error was reported: