undertherain / benchmarker

modular framework for [not only] deep learning performance benchmarking
http://blackbird.pw/performance
Mozilla Public License 2.0
9 stars 5 forks source link

Deleted old ssd #175

Closed vatai closed 3 years ago

vatai commented 3 years ago

Some funky stuff happening with ssd300 on fugaku

vatai commented 3 years ago

This is the error:

a04082@l31-4209c /vol0004/ra000012/a04082/code/benchmarker (master) $ LD_PRELOAD=libtcmalloc.so OMP_NUM_THREADS=12 run_on_cmg python3 -m benchmarker --problem=ssd300 --framework=pytorch --problem_size=48 --batch_size=12 --nb_epoch=2 --mode=training --backend=DNNL --tensor_layout=DNNL
Downloading: "https://github.com/nvidia/DeepLearningExamples/archive/torchhub.zip" to /home/ra000012/a04082/.cache/torch/hub/torchhub.zip
Downloading: "https://api.ngc.nvidia.com/v2/models/nvidia/ssd_pyt_ckpt_amp/versions/19.09.0/files/nvidia_ssdpyt_fp16_190826.pt" to /home/ra000012/a04082/.cache/torch/hub/checkpoints/nvidia_ssdpyt_fp16_190826.pt
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 175M/175M [00:39<00:00, 4.60MB/s]
Traceback (most recent call last):
  File "/home/apps/oss/PyTorch-1.7.0/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/apps/oss/PyTorch-1.7.0/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/__main__.py", line 95, in <module>
    main()
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/__main__.py", line 53, in main
    result = benchmarker.benchmarker.run(unknown_args)
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/benchmarker.py", line 138, in run
    benchmark.measure_power_and_run()
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/frameworks/i_benchmark.py", line 9, in measure_power_and_run
    results = self.run()
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/frameworks/do_pytorch.py", line 137, in run
    self.train(model, optimizer, epoch)
  File "/vol0004/ra000012/a04082/code/benchmarker/benchmarker/frameworks/do_pytorch.py", line 94, in train
    loss = model(data, target)
  File "/home/apps/oss/PyTorch-1.7.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given