Closed eran-sefirot closed 1 year ago
when running: python mii-sd.py
a_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'transformer_inference'
[2022-11-27 11:35:16,846] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6581
[2022-11-27 11:35:16,846] [ERROR] [launch.py:324:sigkill_handler] ['/opt/conda/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-to-image', '--model', 'CompVis/stable-diffusion-v1-4', '--model-path', '/tmp/mii_models', '--port', '50050', '--ds-optimize', '--provider', 'diffusers', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiAxLCAicG9ydF9udW1iZXIiOiA1MDA1MCwgImR0eXBlIjogImZwMTYiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IG51bGwsICJkZXBsb3lfcmFuayI6IFswXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiaGZfYXV0aF90b2tlbiI6ICJoZl9Xc0NwVWFFYVhMbGtEZEtLTkVtS2NxZk9vTHBjcWxXWHF5IiwgInJlcGxhY2Vfd2l0aF9rZXJuZWxfaW5qZWN0IjogdHJ1ZSwgInByb2ZpbGVfbW9kZWxfdGltZSI6IGZhbHNlLCAic2tpcF9tb2RlbF9jaGVjayI6IGZhbHNlfQ=='] exits with return code = 1
[2022-11-27 11:35:18,791] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
File "/home/ec2-user/DeepSpeed-MII/examples/benchmark/txt2img/mii-sd.py", line 15, in
OK I've installed the latest AMI for deep learning with cuda 11.7 now I get the following when running python mii-sd.py:
/opt/conda/envs/pytorch/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu:8:10: fatal error: cuda_profiler_api.h: No such file or directory
^~~~~~~~~~~~~~~~~~~~~
I've switched to different AMI with pytorch 1.2 and cuda 1.6 and now I get the following error:
Time to load spatial_inference op: 17.237044095993042 seconds
**** found and replaced unet w. <class 'deepspeed.model_implementations.diffusers.unet.DSUNet'>
About to start server
Started
[2022-11-27 13:35:10,519] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2022-11-27 13:35:15,524] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2022-11-27 13:35:15,524] [INFO] [server_client.py:118:_wait_until_server_is_live] server has started on 50050
Traceback (most recent call last):
File "/home/ec2-user/DeepSpeed-MII/examples/benchmark/txt2img/mii-sd.py", line 23, in
This was resolved recently. Please see https://github.com/microsoft/DeepSpeed-MII/issues/112#issuecomment-1334475650
Please reopen if this issue is still not resolved.
Hello! Thanks for this great optimization, We're using a fresh ec2 G5XL instance,
After installing everything and running python baseline-sd.py I see the following error:
I've installed the envoirment using: pip install deepspeed[sd] deepspeed-mii
when running ds_report I see the following output:
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-devel package with yum [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] spatial_inference ...... [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/opt/conda/lib/python3.9/site-packages/torch'] torch version .................... 1.13.0+cu117 torch cuda version ............... 11.7 torch hip version ................ None nvcc version ..................... 11.5 deepspeed install path ........... ['/opt/conda/lib/python3.9/site-packages/deepspeed'] deepspeed info ................... 0.7.5, unknown, unknown deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0