nvdla / sw

NVDLA SW
Other
486 stars 191 forks source link

Protocol mismatch issue in the regression test. #11

Open ericrxw opened 6 years ago

ericrxw commented 6 years ago

While I run the regression test in the SW. I get an error of Protocol mismatch like following. Can you tell me what's the possible reason and how to solve it? Thank you very much.

Running testplan[firmware] level[0] [sim] tests Test (1/1) 76A9A4 kmdrun.py BDMA_L0_0 ... ['--ssh', '--odir', '/ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/out/BDMA_L0_0_76a9a4/', '--guid', '76a9a4', '-o', 'output.log', '-t', 'BDMA_L0_0', '-i', 'one.pgm']

Run cmd: /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/kmdrun.py --ssh --odir /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/out/BDMA_L0_0_76a9a4/ --guid 76a9a4 -o output.log -t BDMA_L0_0 -i one.pgm --project dla --target sim --os linux --debug 0 --testhome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/flatbufs --imghome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/images --goldhome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/golden ################################################################################ Test BDMA_L0_0
################################################################################ Mon Jan 08 16:42:38 2018 on hydra2 Command: /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/kmdrun.py --ssh --odir /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/out/BDMA_L0_0_76a9a4/ --guid 76a9a4 -o output.log -t BDMA_L0_0 -i one.pgm --project dla --target sim --os linux --debug 0 --testhome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/flatbufs --imghome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/images --goldhome /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/golden ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt/images/linux-4.13.3 && /mnt/images/linux-4.13.3/nvdla_runtime -s Waiting for server ready... Done connecting

Launching test: python /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/dla_client.py -i /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/flatbufs/kmd/BDMA/BDMA_L0_0_fbuf --img /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/images/digits/one.pgm -o /ubc/ece/home/ml/grads/xiaowei/My_work/ML_accelerators/nvdla/sw/regression/scripts/out/BDMA_L0_0_76a9a4/ ==> dlaSocket.py :: Connection accepted ==> dla_client.py :: DLA Client open at PORT: 6667. ==> dla_client.py :: Requesting welcome message ==> dlaSocket.py :: sending size 11 bytes ) ==> dlaSocket.py :: received non-integer size (SSH-2.0-OpenSSH_7.6 ==> dlaSocket.py :: reading -1 bytes from Server. ==> dla_client.py :: Received welcome message: {} ==> dla_client.py :: Attempting to read flatbuf: [BDMA_L0_0_fbuf], size[4156]. ==> dla_client.py :: Sending and loading flatbuffer ==> dlaSocket.py :: sending size 12 bytes ==> dlaSocket.py :: sending size 4156 bytes ==> dla_client.py :: Attempting to run image: [one.pgm], size[797]. ==> dla_client.py :: Pre process image with shift [0.0], scaling factor [1.0] and power factor [1.0] ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dla_client.py :: Don't perform softmax on test output ==> dlaSocket.py :: sending size 15 bytes ==> dlaSocket.py :: sending size 2 bytes ==> dla_client.py :: Seding and running image ==> dlaSocket.py :: sending size 17 bytes ==> dlaSocket.py :: sending size 797 bytes ==> dlaSocket.py :: received non-integer size (Protocol mismatch.) ==> dlaSocket.py :: reading -1 bytes from Server. ==> dla_client.py :: Received [one.pgm] test results: {} Test reported an error: 1 FFFFF A IIIII L
F A A I L
FFFF AAAAA I L
F A A I L
F A A IIIII LLLLL

Shutting down test server ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/images/linux-4.13.3/nvdla_runtime -s' | awk '{print $1}' | xargs kill" Test server shutdown. FAIL

         Testing Stats              

Pass Fail Ran Written
0.0% 100.0% 100.0% 100.0%

Pass = 0 Fail = 1 Ran = 1 Written = 1 Total = 1

DLA sanity failed

yirs2001 commented 6 years ago

I did two case: BDMA & CDP regression test(The log is CDP)

Issue1: There is no "-i" option for imput image Issue2: Base on DLAServer.log: libnvdla<1> failed to open dla device (it seems fail to open dla device)

My command: python2 kmdrun.py --ssh --odir /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/willyi/nvdla/sw/regression/flatbufs --goldhome /home/willyi/nvdla/sw/regression/golden --noclean --show The following is my log: willyi@mantis-HP-EliteDesk-800-G1-TWR:~/nvdla/sw/regression/scripts$ python2 kmdrun.py --ssh --odir /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/willyi/nvdla/sw/regression/flatbufs --goldhome /home/willyi/nvdla/sw/regression/golden --noclean --show ################################################################################ Test CDP_L0_0 ################################################################################ Thu Feb 08 15:06:15 2018 on mantis-HP-EliteDesk-800-G1-TWR Command: kmdrun.py --ssh --odir /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/willyi/nvdla/sw/regression/flatbufs --goldhome /home/willyi/nvdla/sw/regression/golden --noclean --show ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Failed to launch test server Let's try again ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Done connecting

Launching test: python2 /home/willyi/nvdla/sw/regression/scripts/dla_client.py -i /home/willyi/nvdla/sw/regression/flatbufs/kmd/CDP/CDP_L0_0_fbuf --img /home/willyi/nvdla/sw/regression/images/digits/six.pgm -o /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 ==> dlaSocket.py :: Connection accepted ==> dla_client.py :: DLA Client open at PORT: 6666. ==> dla_client.py :: Requesting welcome message ==> dlaSocket.py :: sending size 11 bytes ==> dlaSocket.py :: reading 12 bytes from Server. ==> dla_client.py :: Received welcome message: {Hello World!} ==> dla_client.py :: Attempting to read flatbuf: [CDP_L0_0_fbuf], size[7056]. ==> dla_client.py :: Sending and loading flatbuffer ==> dlaSocket.py :: sending size 12 bytes ==> dlaSocket.py :: sending size 7056 bytes ==> dla_client.py :: Attempting to run image: [six.pgm], size[797]. ==> dla_client.py :: Pre process image with shift [0.0], scaling factor [1.0] and power factor [1.0] ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dla_client.py :: Don't perform softmax on test output ==> dlaSocket.py :: sending size 15 bytes ==> dlaSocket.py :: sending size 2 bytes ==> dla_client.py :: Seding and running image ==> dlaSocket.py :: sending size 17 bytes ==> dlaSocket.py :: sending size 797 bytes ==> dlaSocket.py :: reading 17 bytes from Server. ==> dla_client.py :: Received [six.pgm] test results: {[OK] Test FAILED!} Test reported an error: 1 FFFFF A IIIII L F A A I L FFFF AAAAA I L F A A I L F A A IIIII LLLLL

Shutting down test server ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" Test server shutdown.

The following is DLAServer.log: ... Waiting for command from Client... Received command: {GET_WELCOME} Sending welcome message: {Hello World!} Sending size 12 bytes. Waiting for command from Client... Received command: {READ_FLATBUF} Waiting for command from Client... Received command: {PERFORM_SHIFT} Waiting for command from Client... Received command: {PERFORM_SCALE} Waiting for command from Client... Received command: {PERFORM_POWER} Waiting for command from Client... Received command: {PERFORM_SOFTMAX} Waiting for command from Client... Received command: {RUN_IMAGE_six.pgm} starting to run image six.pgm of size 797 Executing the Test... creating new runtime context... libnvdla<1> failed to open dla device libnvdla<1> Out of bounds DLA instance 0 requested.

yirs2001 commented 6 years ago
  1. The previous log error is by nvdla_runtime is not executed successfully.
  2. The following is log after "nvdla_runtime -s" is working.
  3. Another issue is all NN,PDP,SDP,CONV,BDMA,CDP,RBK are pass, but o_000000.dimg are all zero. willyi@mantis-HP-EliteDesk-800-G1-TWR:~/nvdla/sw/regression/scripts$ python2 kmdrun.py --ssh --odir /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/willyi/nvdla/sw/regression/flatbufs --goldhome /home/willyi/nvdla/sw/regression/golden --noclean --show ################################################################################ Test CDP_L0_0 ################################################################################ Fri Feb 09 14:26:29 2018 on mantis-HP-EliteDesk-800-G1-TWR Command: kmdrun.py --ssh --odir /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/willyi/nvdla/sw/regression/flatbufs --goldhome /home/willyi/nvdla/sw/regression/golden --noclean --show ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Failed to launch test server Let's try again ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Done connecting

Launching test: python2 /home/willyi/nvdla/sw/regression/scripts/dla_client.py -i /home/willyi/nvdla/sw/regression/flatbufs/kmd/CDP/CDP_L0_0_fbuf --img /home/willyi/nvdla/sw/regression/images/digits/six.pgm -o /home/willyi/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 ==> dlaSocket.py :: Connection accepted ==> dla_client.py :: DLA Client open at PORT: 6666. ==> dla_client.py :: Requesting welcome message ==> dlaSocket.py :: sending size 11 bytes ==> dlaSocket.py :: reading 12 bytes from Server. ==> dla_client.py :: Received welcome message: {Hello World!} ==> dla_client.py :: Attempting to read flatbuf: [CDP_L0_0_fbuf], size[7056]. ==> dla_client.py :: Sending and loading flatbuffer ==> dlaSocket.py :: sending size 12 bytes ==> dlaSocket.py :: sending size 7056 bytes ==> dla_client.py :: Attempting to run image: [six.pgm], size[797]. ==> dla_client.py :: Pre process image with shift [0.0], scaling factor [1.0] and power factor [1.0] ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dlaSocket.py :: sending size 13 bytes ==> dlaSocket.py :: sending size 6 bytes ==> dla_client.py :: Don't perform softmax on test output ==> dlaSocket.py :: sending size 15 bytes ==> dlaSocket.py :: sending size 2 bytes ==> dla_client.py :: Seding and running image ==> dlaSocket.py :: sending size 17 bytes ==> dlaSocket.py :: sending size 797 bytes ==> dlaSocket.py :: reading 17 bytes from Server. ==> dla_client.py :: Received [six.pgm] test results: {[OK] Test PASSED!} ==> dla_client.py :: Requesting number of test outputs ==> dlaSocket.py :: sending size 14 bytes ==> dlaSocket.py :: reading 1 bytes from Server. ==> dla_client.py :: Received number of test outputs: {1} ==> dla_client.py :: Requesting test output[0] ==> dlaSocket.py :: sending size 10 bytes ==> dlaSocket.py :: sending size 1 bytes ==> dlaSocket.py :: reading 0 bytes from Server. ==> dla_client.py :: Received test output[0] Test completed

LEAD PASS: test.md5 matches lead.md5

PPPP A SSSSS SSSSS P P A A S S PPPP AAAAA SSSSS SSSSS P A A S S P A A SSSSS SSSSS

Shutting down test server Test server shutdown.

Dumping server logs

...

arvindhbti commented 5 years ago

Hi @yirs2001 @ericrxw

I am getting this error. Any solution please.

                            Test   CDP_L0_0                                 

################################################################################ Mon Sep 09 14:16:29 2019 on sandisk-usb-sys Command: kmdrun.py --ssh --odir /home/sandisk/Desktop/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 --guid 66cb25 -o output.log -t CDP_L0_0 --project dla --target sim --os linux --debug 0 --testhome /home/sandisk/Desktop/nvdla/sw/regression/flatbufs --goldhome /home/sandisk/Desktop/nvdla/sw/regression/golden --noclean --show ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Failed to launch test server Let's try again ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Failed to launch test server Let's try again ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" ssh cmd: ssh root@127.0.0.1 -p 6667 export LD_LIBRARY_PATH=/mnt && /mnt/nvdla_runtime -s Waiting for server ready... Failed to launch test server Let's try again ssh cmd: ssh root@127.0.0.1 -p 6667 "ps | grep '/mnt/nvdla_runtime -s' | awk '{print $1}' | xargs kill" Launching test: python /home/sandisk/Desktop/nvdla/sw/regression/scripts/dla_client.py -i /home/sandisk/Desktop/nvdla/sw/regression/flatbufs/kmd/CDP/CDP_L0_0_fbuf -o /home/sandisk/Desktop/nvdla/sw/regression/scripts/out/CDP_L0_0_66cb25 Traceback (most recent call last): File "kmdrun.py", line 51, in main() File "kmdrun.py", line 42, in main runScript.runTest() File "/home/sandisk/Desktop/nvdla/sw/regression/scripts/KmdRunScript.py", line 171, in runTest while self.serverlinebuffer: AttributeError: 'KmdRunScript' object has no attribute 'serverlinebuffer'.

Please let me know the reason.Thanks.

arvindhbti commented 5 years ago

Hi @prasshantg can you please help me in getting resolved this issue. Thanks.

prasshantg commented 5 years ago

@arvindhbti please create separate issue for it with more details such as configuration, platform, full steps, tag of github/sw