Error building extension 'cpu_adam'

Hey guys, I'm having a problem getting DeepSpeed working with XLM-Roberta. I'm trying to run it on an Amazon Linux machine, which is based on Red Hat. Here are a some versions of packages/dependencies I'm using:

cuda version: 10.2 transformers: 4.4.2 pytorch: 1.7.1 deepspeed: 0.3.13 gcc/c++/g++: (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)

I must admit I had some issues upgrading the CUDA version from the default 10.0 on the instance to 10.2 and GCC from 4.8.5 to 7.2.1 but since I don't get the error that the torch and installed CUDA versions are different and that GCC has a version lower than 5, I'd assume I'm in the clear.

Here's the essential part of the code I'm running (from a notebook):

import os
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '9994' # modify if RuntimeError: Address already in use
os.environ['RANK'] = "0"
os.environ['LOCAL_RANK'] = "0"
os.environ['WORLD_SIZE'] = "1"

from transformers import Trainer, TrainingArguments, XLMRobertaForSequenceClassification, XLMRobertaTokenizer

model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base')

training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    save_steps=500,
    save_total_limit=2,
    deepspeed="my_ds_config.json"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Here's the content of my config file:

{
    "fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },

    "zero_optimization": {
        "stage": 2,
       "allgather_partitions": true,
       "allgather_bucket_size": 2e8,
       "reduce_scatter": true,
       "reduce_bucket_size": 2e8,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "cpu_offload": true
    },

    "optimizer": {
        "type": "Adam",
        "params": {
            "adam_w_mode": true,
            "lr": 3e-5,
            "betas": [ 0.9, 0.999 ],
            "eps": 1e-8,
            "weight_decay": 3e-7
        }
    },

    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 3e-5,
            "warmup_num_steps": 500
        }
    }
}

Here's the output of my ds_config:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.2
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.13+22d5a1f, 22d5a1f, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.2

And finally, here's the stack trace:

[2021-03-24 15:29:36,478] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13, git-hash=unknown, git-branch=unknown
[2021-03-24 15:29:36,494] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module cpu_adam, skipping build step...
Loading extension module cpu_adam...
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-131-cc14ac05ecbb> in <module>
     30 )
     31 
---> 32 trainer.train()

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
    901         delay_optimizer_creation = self.sharded_ddp is not None and self.sharded_ddp != ShardedDDPOption.SIMPLE
    902         if self.args.deepspeed:
--> 903             model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
    904             self.model = model.module
    905             self.model_wrapped = model  # will get further wrapped in DDP

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/integrations.py in init_deepspeed(trainer, num_training_steps)
    416         model=model,
    417         model_parameters=model_parameters,
--> 418         config_params=config,
    419     )
    420 

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/__init__.py in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params)
    123                                  dist_init_required=dist_init_required,
    124                                  collate_fn=collate_fn,
--> 125                                  config_params=config_params)
    126     else:
    127         assert mpu is None, "mpu must be None with pipeline parallelism"

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in __init__(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params, dont_change_device)
    181         self.lr_scheduler = None
    182         if model_parameters or optimizer:
--> 183             self._configure_optimizer(optimizer, model_parameters)
    184             self._configure_lr_scheduler(lr_scheduler)
    185             self._report_progress(0)

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_optimizer(self, client_optimizer, model_parameters)
    596                 logger.info('Using client Optimizer as basic optimizer')
    597         else:
--> 598             basic_optimizer = self._configure_basic_optimizer(model_parameters)
    599             if self.global_rank == 0:
    600                 logger.info(

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_basic_optimizer(self, model_parameters)
    665                     optimizer = DeepSpeedCPUAdam(model_parameters,
    666                                                  **optimizer_parameters,
--> 667                                                  adamw_mode=effective_adam_w_mode)
    668                 else:
    669                     from deepspeed.ops.adam import FusedAdam

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py in __init__(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode)
     76         DeepSpeedCPUAdam.optimizer_id = DeepSpeedCPUAdam.optimizer_id + 1
     77         self.adam_w_mode = adamw_mode
---> 78         self.ds_opt_adam = CPUAdamBuilder().load()
     79 
     80         self.ds_opt_adam.create_adam(self.opt_id,

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in load(self, verbose)
    213             return importlib.import_module(self.absolute_name())
    214         else:
--> 215             return self.jit_load(verbose)
    216 
    217     def jit_load(self, verbose=True):

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in jit_load(self, verbose)
    250             extra_cuda_cflags=self.nvcc_args(),
    251             extra_ldflags=self.extra_ldflags(),
--> 252             verbose=verbose)
    253         build_duration = time.time() - start_build
    254         if verbose:

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1089     if isinstance(cuda_sources, str):
   1090         cuda_sources = [cuda_sources]
-> 1091 
   1092     cpp_sources.insert(0, '#include <torch/extension.h>')
   1093 

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1315 
   1316 
-> 1317 def verify_ninja_availability():
   1318     r'''
   1319     Raises ``RuntimeError`` if `ninja <https://ninja-build.org/>`_ build system is not

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _import_module_from_library(module_name, path, is_python_module)
   1697                       sources,
   1698                       objects,
-> 1699                       ldflags,
   1700                       library_target,
   1701                       with_cuda) -> None:

~/anaconda3/envs/python3/lib/python3.6/imp.py in find_module(name, path)
    295         break  # Break out of outer loop when breaking out of inner loop.
    296     else:
--> 297         raise ImportError(_ERR_MSG.format(name), name=name)
    298 
    299     encoding = None

ImportError: No module named 'cpu_adam'

Thanks in advance for your help!

Hi @arthur-morgan-712

I am not sure what exactly is happening here, since in the trace log I am seeing it says that it is trying to load the extension module cpu-adam, however the ds-report says it is not installed! I am thinking maybe this might be a problem of the system caching this module somehow! Can you remove the /tmp/torch_extensions folder (rm -rf /tmp/torch_extentions/*) and retry this?

Thanks. Reza

Hi Reza,

Thanks for your reply! I tried removing it, it did have cpu_adam as a subfolder, however the issue still persists.

Funnily enough, I tried running this on Colab and it seemed to have loaded it:

Loading extension module cpu_adam...
Time to load cpu_adam op: 33.38163924217224 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1

And Colab's ds_report also states that it hasn't been installed:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.13+22d5a1f, 22d5a1f, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

Albeit, I did get another error on Colab (tcmalloc: large alloc 1200709632 bytes == .... Killing subprocess 346) but I assume it's because the 11 GB video card was still rather small for this task. So maybe cpu_adam not showing up as installed isn't out of the ordinary?

Quick update, I ran the command from the notebook (running run_seq2seq.py) on the Amazon Linux instance and it seems to have provided a more detailed stack trace, I'm pasting it here:

Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256__ -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256__ -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:4:10: fatal error: omp.h: No such file or directory
 #include <omp.h>
          ^~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1539, in _run_ninja_build
    env=env)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "examples/seq2seq/run_seq2seq.py", line 650, in <module>
    main()
  File "examples/seq2seq/run_seq2seq.py", line 590, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/ec2-user/SageMaker/checkstep-research/t5/transformers/src/transformers/trainer.py", line 892, in train
    model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
  File "/home/ec2-user/SageMaker/checkstep-research/t5/transformers/src/transformers/integrations.py", line 417, in init_deepspeed
    config_params=config,
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/__init__.py", line 125, in initialize
    config_params=config_params)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 183, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 598, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 667, in _configure_basic_optimizer
    adamw_mode=effective_adam_w_mode)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py", line 78, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 215, in load
    return self.jit_load(verbose)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 252, in jit_load
    verbose=verbose)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1202, in _jit_compile
    with_cuda=with_cuda)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1300, in _write_ninja_file_and_build_library
    error_prefix="Error building extension '{}'".format(name))
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Killing subprocess 6428
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
    main()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 161, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ec2-user/anaconda3/envs/python3/bin/python3.6', '-u', 'examples/seq2seq/run_seq2seq.py', '--local_rank=0', '--model_name_or_path', 'google/mt5-small', '--output_dir', 'output_dir', '--adam_eps', '1e-06', '--evaluation_strategy=steps', '--do_train', '--label_smoothing', '0.1', '--learning_rate', '3e-5', '--logging_first_step', '--logging_steps', '1000', '--max_source_length', '128', '--max_target_length', '128', '--num_train_epochs', '1', '--overwrite_output_dir', '--per_device_train_batch_size', '16', '--predict_with_generate', '--sortish_sampler', '--val_max_target_length', '128', '--warmup_steps', '500', '--max_train_samples', '2000', '--max_val_samples', '500', '--task', 'translation_en_to_ro', '--dataset_name', 'wmt16', '--dataset_config', 'ro-en', '--source_prefix', 'translate English to Romanian: ', '--deepspeed', 'ds_config.json', '--fp16']' returned non-zero exit status 1.

I was able to get the example from the notebook going after I downgraded the DeepSpeed version to 0.3.10. I do have a follow-up question though: Correct me if I'm wrong, but the only way to use DeepSpeed would be to use the HuggingFace Trainer class? At least that's what I can find on HuggingFace (also trying to implement a custom script without using Trainer resulted in an unrecognized arguments: --local_rank=0 error, even though I wasn't passing that argument). If that's the case, does this mean I cannot use DeepSpeed to train a classification model passing different class weights to each class, because there's no such parameter to pass to Trainer?

We already have examples for running for some transformer networks. For this argument, I think you might just add __local_rank__ to your parser arguments the same as here.

I was able to get the example from the notebook going after I downgraded the DeepSpeed version to 0.3.10.

So is this the issue with 0.3.13 of DeepSpeed? Because I'm facing the same issue as well with 0.3.13.

Also, are you able to run HuggingFace-4.4.2 with DeepSpeed-0.3.10 ? I think you should've downgraded to HuggingFace-4.3.x.

Hi @arthur-morgan-712 ,

Could you try building deepspeed ops while installing as suggested here?

We already have examples for running for some transformer networks. For this argument, I think you might just add local_rank to your parser arguments the same as here.

This is no longer needed in deepspeed since https://github.com/microsoft/DeepSpeed/pull/825 and transformers master has been adjusted accordingly. You just need to have env LOCAL_RANK to be set.

I do have a follow-up question though: Correct me if I'm wrong, but the only way to use DeepSpeed would be to use the HuggingFace Trainer class?

Not at all. You can do your own integration and not rely on the HF Trainer.

If you do use transformers Trainer for a time being while this is all new you must use the transformers master branch as frequent deepspeed-related updates are made.

If you have build problems please make sure you read: https://huggingface.co/transformers/main_classes/trainer.html#installation-notes though looking at OP I think you have all the right components. Just check that PATH/LD_LIBRARY_PATH are good.

Perhaps try to pre-build deepspeed: https://github.com/microsoft/DeepSpeed/issues/885#issuecomment-808339237

Thanks @stas00 for clarifying this : )

And the error is right there in your report: https://github.com/microsoft/DeepSpeed/issues/889#issuecomment-806526657

c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -DAVX256 -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:4:10:

fatal error: omp.h: No such file or directory

include

You're missing the right build tools. omp.h is missing. It should be in your gcc dev package. e.g. on my machine it's under:

/usr/lib/gcc/x86_64-linux-gnu/6/include/omp.h
/usr/lib/gcc/x86_64-linux-gnu/7/include/omp.h
/usr/lib/gcc/x86_64-linux-gnu/9/include/omp.h

@RezaYazdaniAminabadi, perhaps ds_report could somehow check that all the build components are there? e.g. it could attempt to build some dummy very basic extension when run? not sure on the details. Clearly here gcc-dev tools are either missing or misconfigured.

Still running into: ImportError: No module named 'cpu_adam'

CUDA 10.2 deepspeed 0.4.3 pytorch 1.8.1

Tried down grading deepspeed to 0.3.10 and ran into: "deepspeed>=0.4.3 is required for a normal functioning of this module, but found deepspeed==0.3.10."

Any other potential solutions to date or still open??

I also occur that. before AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' The error show: cannot make a dir in /tmp/torch_extensions/build for cpu_adam. So I change the DEFAULT_TORCH_EXTENSION_PATH in the file /anaconda3/envs/XXXXX/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py then it works

This sounds like a permission issue. Try to set TMPDIR to another dir that is writable by you?

e.g.:

mkdir ~/tmp
export TMPDIR=~/tmp
... do the build here ...

Try to set TMPDIR to another dir that is writable by you?

That won't work because deepspeed hardcodes the default extension path to be /tmp/torch_extensions

The default is not used if TORCH_EXTENSIONS_DIR is set in the environment, but it would certainly be an improvement for it to follow TMPDIR (especially as TORCH is not in the whitelist of environment variable prefixes which the deepspeed distributed launcher automatically plumbs through to worker processes)

@stas00 @RezaYazdaniAminabadi huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [INFO|trainer.py:414] 2023-01-09 19:06:58,180 >> Using amp fp16 backend [2023-01-09 19:06:58,187] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2023-01-09 19:06:58,191] [WARNING] [config_utils.py:67:process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead [2023-01-09 19:07:05,242] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py38_cu117/cpu_adam... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o FAILED: cpu_adam.o c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0, from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16, from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:11, from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1: /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
include

^~~~~ compilation terminated. [2/3] /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS -D_CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS -U_CUDA_NO_HALFCONVERSIONS -U_CUDA_NO_HALF2OPERATORS -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o FAILED: custom_cuda_kernel.cuda.o /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16CONVERSIONS -D_CUDA_NO_HALF2OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALFOPERATORS -U_CUDA_NO_HALFCONVERSIONS -U_CUDA_NO_HALF2OPERATORS -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0, from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16, from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu:1: /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory

include

^~~~~ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/root/anaconda3/envs/bitten/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 441, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/trainer.py", line 1112, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/deepspeed.py", line 355, in deepspeedinit model, optimizer, , lr_scheduler = deepspeed.initialize( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize engine = DeepSpeedEngine(args=args, File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 330, in init self._configure_optimizer(optimizer, model_parameters) File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1195, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters) File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1266, in _configure_basic_optimizer optimizer = DeepSpeedCPUAdam(model_parameters, File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init self.ds_opt_adam = CPUAdamBuilder().load() File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load return self.jit_load(verbose) File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load op_module = load( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'cpu_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7efc99bbd1f0> Traceback (most recent call last): File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' [2023-01-09 19:07:12,824] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 7728 [2023-01-09 19:07:12,826] [ERROR] [launch.py:324:sigkill_handler] AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

@arthur-morgan-712, you have a problem with your cuda environment:

/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10:
 fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>

properly install the cuda environment, including all dev header files and do the run again and it'll work.

on ubuntu I usually recommend the nvidia cuda pre-packaged .deb files, but be careful the latest cuda is already in 12.x so you make sure you're installing the same major version as pytorch - most likely cuda-11.x - I think 11.8 is the latest in that line.

e.g. on my system the missing file is here:

/usr/local/cuda-11.8/targets/x86_64-linux/include/cuda_profiler_api.h

In my case, this BUG is due to ninja compile error, you can change directory to ~/.cache/torch_extensions/cpu_adam, then run ninja -v to see the error details.

Closing this issue as the original issue was resolved. If anyone is having issues with this, please open a new issue and link this one and we would be happy to take a look.

In my case, this BUG is due to ninja compile error, you can change directory to ~/.cache/torch_extensions/cpu_adam, then run ninja -v to see the error details.

hi @Chiang , i follow your advice to run ninja -v under dir /root/.cache/torch_extensions/py310_cu118/cpu_adam, and get error info below, please check and tell me how to fix this problem

[1/1] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so 
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

/usr/bin/ld: cannot find -lcurand

It can't find your cuda install - if you have it installed search for where libcurand.so is on your fs.

e.g. on my machine it's: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcurand.so.10.3.2.106 so in that case adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/targets/x86_64-linux/lib should help finding that shared object. You need to edit this path to where your cuda is installed if any.

But most likely you don't have cuda installed. You can install it via apt https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation or even inside a conda environment via https://anaconda.org/conda-forge/cudatoolkit if you don't have sudo access. (but I see it's only 11.8 - I don't know if you need cuda-12.x or not).

I haven't tried, but this is probably the right package if you want to install it into your conda and has all the latest versions as well https://anaconda.org/nvidia/cuda-libraries - use the same version as your pytorch, to check which pytorch cuda version is used run:

python -c 'import torch; print(f"pt={torch.__version__}, cuda={torch.version.cuda}")'

/usr/bin/ld: cannot find -lcurand

It can't find your cuda install - if you have it installed search for where libcurand.so is on your fs.

e.g. on my machine it's: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcurand.so.10.3.2.106 so in that case adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/targets/x86_64-linux/lib should help finding that shared object. You need to edit this path to where your cuda is installed if any.

But most likely you don't have cuda installed. You can install it via apt https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation or even inside a conda environment via https://anaconda.org/conda-forge/cudatoolkit if you don't have sudo access. (but I see it's only 11.8 - I don't know if you need cuda-12.x or not).

I haven't tried, but this is probably the right package if you want to install it into your conda and has all the latest versions as well https://anaconda.org/nvidia/cuda-libraries - use the same version as your pytorch, to check which pytorch cuda version is used run:
python -c 'import torch; print(f"pt={torch.__version__}, cuda={torch.version.cuda}")'

hi @stas00 , it's not working, i change the env var LD_LIBRARY_PATH but the err is same as before

@stas00 pls see the detail below

[1/1] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so 
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

i export the LD_LIBRARY_PATH first and run the ninja -v under concern folder but it has no change for linkage err, Why?

I see you have used /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib for LD_LIBRARY_PATH

Did you check that you have libcurand.so under /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib?

and which conda package did you install?

I see you have used /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib for LD_LIBRARY_PATH

Did you check that you have libcurand.so under /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib?

and which conda package did you install?

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

it seems like deepspeed gen a wrong ninja config and cause the program fail. i think this should be a bug and require a fix

@stas00 for detail info, pls refer the 5813 issue i mention above

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

In which case you need to set:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

To automate this see: https://askubuntu.com/questions/210884/setting-ld-library-path-for-cuda

The other solution that often helps is to set CUDA_HOME

export CUDA_HOME=/usr/local/cuda

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

In which case you need to set:
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
To automate this see: https://askubuntu.com/questions/210884/setting-ld-library-path-for-cuda

The other solution that often helps is to set CUDA_HOME
export CUDA_HOME=/usr/local/cuda

nope, the env LD_LIBRARY_PATH take effect at runtime so it is only used when program loaded. and for now the cpu_adam fail at compile time. So LD_LIBRARY_PATH provides no help. I manuall modify the build.ninja and run ninja -v under ~/.cache/torch_extensions/py310_cu118/cpu_adam, it can be good. However, if i run the whole training program at start, the deepspeed will overide the build.ninja file to wrong setting again and it still crush at compile time

What I shared works for me. That's what I use on my desktop to build deepspeed.

What I shared works for me. That's what I use on my desktop to build deepspeed. but it's not work for me. To be more specifically, i encounter this problem when i use llama factory intergrated with deepspeed to train a model, you can follow the link below to get more info.(skip the chinese and focus on output and logs if you have difficulty in reading chinese content)

https://github.com/hiyouga/LLaMA-Factory/issues/5020

microsoft / DeepSpeed

Error building extension 'cpu_adam' #889

include

include

include