microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.17k stars 4.07k forks source link

ImportError: dynamic module does not define module export function (PyInit_transformer_op) #949

Open gongjingcs opened 3 years ago

gongjingcs commented 3 years ago

hi,I install deepspeed using the following command: DS_BUILD_OPS=1 python setup.py build_ext -j8 bdist_wheel pip install dist/deepspeed-0.3.13+69bca4a-cp38-cp38-linux_x86_64.whl

I ran a demo with transformer kernel, I met with the following error: image

However if I install deepspeed from source, the demo works well

gongjingcs commented 3 years ago

image

tjruwase commented 3 years ago

@gongjingcs. can you please share the results of running ds_reportin both working and failing installations?

gongjingcs commented 3 years ago

@tjruwase working installations: image

failing installations: image

both ds_report results of working and failing installations are the same

gongjingcs commented 3 years ago

@tjruwase would you please provide some tips for solving this problem ?

tjruwase commented 3 years ago

@gongjingcs, apologies for the delay on this. I will take a closer look today.

RezaYazdaniAminabadi commented 3 years ago

Hi @gongjingcs

I think you might have some incompatibility issue between the torch you installed and the one you are using. Also, the CUDA version you are using with Torch1.8 is lower than what torch is supporting based on their website. Could you please try resolving these and rerun the experiment? image image

tjruwase commented 3 years ago

@gongjingcs, did you try resolving the torch incompatibility?

gongjingcs commented 3 years ago

@gongjingcs, did you try resolving the torch incompatibility?

hi, I tried resolving the torch incompatibility. I downgraded my torch version to 1.7.1, however it reports the same error

image

gongjingcs commented 3 years ago

@tjruwase

tjruwase commented 3 years ago

@gongjingcs, got it thanks for confirming that it is not torch incompatibility issue. Will take a look.

tjruwase commented 3 years ago

@gongjingcs, can you provide the repro steps with the transformer kernel that triggers the problem?

gongjingcs commented 3 years ago

@tjruwase ,of course.

step1: DS_BUILD_OPS=1 python setup.py build_ext -j8 bdist_wheel
step2: pip install dist/deepspeed-0.3.13+69bca4a-cp38-cp38-linux_x86_64.whl step3: till now, we have installed deepspeed successfully. ds_report shows image step4: run bing bert demo you provide with transformer kernel https://github.com/microsoft/DeepSpeedExamples/blob/bdf8e59aede8c8e0577e8d4d557298ca8515268f/bing_bert/ds_train_bert_bsz64k_seq128.sh