Open daecheolyou opened 3 years ago
Yuan, can you take a look at this?
Yes, will take a look this week.
Just a guess, did you update trace_file_name
in gem5.cfg to use the correct trace file?
It doesn't need to be modified, but model_files was modified so that it points to pbtxt and pb file under imagenet-resnet. Trace file was generated with trace.sh, whose input is model_files and output file name is always dynamic_trace_acc0.gz.
I just tried running resnet50, while it's still running but it has started running the accelerator for the first convolution layer (conv0), which clearly passed the point where your simulation crashed. In order to reduce the trace size for this relatively large network, the only different I made was using --sample-level=very_high
in trash.sh
(the same in run.sh
). And other than updating the protobuf inputs, the rest of the configuration files are the same as the ones in sims/smv/tests/minerva
.
Did the simulator leave any stacktraces indicating where the segfault occurred?
Below is the stack trace for the simulation failure. I ran simulation several times with resnet, and sometimes it reached further than the log I originally posted. For example, it has reached until _Scheduling relu2b (ReLU). However, it encountered a segmentaion fault eventually with the same kind of stack trace below.
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z15print_backtracev+0x2c)[0x55a3fb5e722c] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x6e92ff)[0x55a3fb5f92ff] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f8073fc9890] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0xcf)[0x7f80725f6d9f] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder10decodeInstENS_11ExtMachInstE+0x2e6c1)[0x55a3fc00f141] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder6decodeENS_11ExtMachInstEm+0x244)[0x55a3fbfa88f4] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder6decodeERNS_7PCStateE+0x22b)[0x55a3fbfa8beb] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN12DefaultFetchI9O3CPUImplE5fetchERb+0x979)[0x55a3fbb0eb69] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN12DefaultFetchI9O3CPUImplE4tickEv+0xd3)[0x55a3fbb0fe23] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN9FullO3CPUI9O3CPUImplE4tickEv+0x12b)[0x55a3fbaedb3b] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xd9)[0x55a3fb5ef709] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x148)[0x55a3fb610e28] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z8simulatem+0xcba)[0x55a3fb611dda] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x7bf6d1)[0x55a3fb6cf6d1] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x5e8754)[0x55a3fb4f8754] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x64d7)[0x7f8074276c47] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f80742705d9] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6ac0)[0x7f8074277230] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f80742705d9] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x76)[0x7f80743206f6] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z6m5MainiPPc+0x83)[0x55a3fb5f8013] /workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(main+0x38)[0x55a3fb448e08]
During simulation with ResNet, a segmentation fault occurs at gem5. I created ResNet pb and pbtxt file by running smaug/experiments/models/imagenet-resnet/resnet_network.py All configuration files are the same with minerva example, but only model_files was modfied so that it points to generated pb and pbtxt file. Input trace was generated by running trace.sh
Below is the stdout log at the end.
_Scheduling data (Data). Scheduling data_1 (Data). Scheduling data_10 (Data). Scheduling data_100 (Data). Scheduling data_101 (Data). Scheduling data_102 (Data). Scheduling data_103 (Data). Scheduling data_104 (Data). Scheduling data_105 (Data). Scheduling data_106 (Data). Scheduling data_107 (Data). Scheduling data_108 (Data). Scheduling data109 (Data).
stderr log before the backtrace shows the following message.
gem5 has encountered a segmentation fault!
Please, let me know if I configured something wrong. Thanks.