Open yuchen2580 opened 1 year ago
Here is my benchmark for several model I tested, experiments are done on V100, with deepspeed 4.30
BLOOM 1.7b | 18.992 / 18.979 |
BLOOM 560M | 26.012 / 26.043 |
GPT-neo 1.3B | 15.402 / 238.091 |
llama 1.7b | 8.995 / 8.895 |
OPT 1.3B | 29.760 / 9181.997 |
numbers on the left of "/" are ppl generated using naive huggingface (pytorch) numbers on the right of "/" are ppl generated using Deepspeed. Seems to me the BLOOM and llama are fine while GPT and OPT are not. Is it possible that some number causes boundary issue in kernel implementation?
Describe the bug I'm conducting experiments with opt-1.3B, and gpt-neo2.7b on wikitext2, with the official example from huggingface and deepspeed. What I observed is that the accuracy and ppl are dropped significantly. BUT, somehow the generated tokens are almost the same. Which is super strange. So far I haven't got time and resource to test other models.
without deepspeed I get ppl= 29.76 with deepspeed I get ppl= 9190.3371
To Reproduce Steps to reproduce the behavior:
and add deepspeed init in-place:
ds_engine = deepspeed.init_inference( model, mp_size=world_size, dtype=torch.float,
max_out_tokens=4096,
model = ds_engine.module
output is the ppl and accuracy summury, so it should be easy to spot.
What packages are required and their versions: transformer==4.28.1 datasets==2.11.0 evaluate==0.4.0 pytorch==1.12.0 no accelerator is used when running
How to run the script
python run_clm.py \ --model_name_or_path ../resource_opt13b \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --per_device_eval_batch_size 1 \ --do_eval \ --output_dir ./tmp
”../resource_opt13b“ can be replaced by huggingface model name 'e.g. opt-1.3b', I downloaded it and loaded it offline
Expected behavior I expect the accuracy and ppl would be the same or at least similar.
ds_report output Please run
ds_report
to give us details about your setup.DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] cpu_adagrad ............ [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] please install triton==1.0.0 if you want to use sparse attention sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/xxxxx/deepspeed/lib/python3.7/site-packages/torch'] torch version .................... 1.12.0 deepspeed install path ........... ['/xxxxx/deepspeed/lib/python3.7/site-packages/deepspeed'] deepspeed info ................... 0.9.1, unknown, unknown torch cuda version ............... 11.6 torch hip version ................ None nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
Docker context Are you using a specific docker image that you can share? None
Additional context Add any other context about the problem here. None