hclhkbu / dlbench

Benchmarking State-of-the-Art Deep Learning Software Tools
http://dlbench.comp.hkbu.edu.hk/
MIT License
170 stars 47 forks source link

Benchmarking Systems without GPU #26

Open jerrin92 opened 6 years ago

jerrin92 commented 6 years ago

Hi Team,

I am trying to benchmark a system without gpu. However, while running the benchmark script, it looks for nvidia-smi.

CalledProcessError: Command 'nvidia-smi' returned non-zero exit status 127

This is the same error that I get with fcn5, alexnet, resnet,lstm.

In addition, we plan to run the benchmarking on mxnet, tensorflow and caffe. So from the documentation, I understand that we need to copy the zip files to $HOME/data. However, we need to use the configuration file that is associated with the particular framework for it to work. Is that correct?

shyhuai commented 6 years ago

Hi, without GPU, please remove line 116 and line 117 , which are used for GPU power collection, in the file of benchmark.py.

It is correct of your understanding that you need to write your own configuration file for your benchmarks. For data preparation, you also need to unzip the data file that you downloaded, and put them in the directory of $HOME/data.

jerrin92 commented 6 years ago

Yup, now the errors related to gpus are gone. However, I get the following error message.

[bt] (9) /N/dc2/scratch/jerkatta/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorSimpleBind+0x2069) [0x7fe9f010d609]

Traceback (most recent call last):
  File "train_cifar10.py", line 54, in <module>
    fit.fit(args, sym, data.get_rec_iter, init)
  File "/gpfs/home/j/e/jerkatta/Carbonate/benchmarking/dlbench/tools/mxnet/common/fit.py", line 187, in fit
    monitor            = monitor)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/base_module.py", line 460, in fit
    for_training=True, force_rebind=force_rebind)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/module.py", line 417, in bind
    state_names=self._state_names)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 231, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
    shared_group))
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/symbol.py", line 1479, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (128, 3L, 32L, 32L)
softmax_label: (128,)
[12:31:58] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage

In addition, the current log files generated are,

./mxnet-cnn-alexnet--devId0,1,2,3-c4-b256-Tue_Jan_16_12:31:50_2018-xx.log:Total time: 1.67378902435

./mxnet-cnn-alexnet--devId0-c1-b1024-Tue_Jan_16_12:31:55_2018-e1.xx.log:Total time: 1.23220992088

./mxnet-cnn-resnet--devId0,1,2,3-c4-b128-Tue_Jan_16_12:31:52_2018-xx.log:Total time: 1.2443048954

./mxnet-cnn-resnet--devId0-c1-b128-Tue_Jan_16_12:31:56_2018-xx.log:Total time: 1.14851498604

./mxnet-fc-fcn5--devId0,1,2,3-c4-b1024-Tue_Jan_16_12:31:47_2018-xx.log:Total time: 2.8816781044

./mxnet-fc-fcn5--devId0-c1-b4096-Tue_Jan_16_12:31:54_2018-xx.log:Total time: 1.44347310066

./mxnet-rnn-lstm--devId0-c1-b1024-Tue_Jan_16_12:31:58_2018-xx.log:Total time: 1.68027210236

and we are running it without GPUs

FreemanX commented 6 years ago

You can narrow down the cause of the problem by testing MXNet only. Direct to dlbench/tools/mxnet and run testbm.sh. You may need to modify the script and comment out the lines for GPU tests. You can also append the flag -debug to each test line so that more info will be given to help you debug.

jerrin92 commented 6 years ago

Okay, shall try the same. Some of the testbm.sh do not have the test statements for CPUs. I am assuming that if I pass -cpuCount 20 instead of -gpuCount 1, that would solve the problem