parsa-epfl / qflex

Quick & Flexible Rack-Scale Computer Architecture Simulator
http://qflex.epfl.ch/
31 stars 10 forks source link

./test_run_system/test.sh failed #13

Closed Jeongseob closed 4 years ago

Jeongseob commented 6 years ago

Hi All,

I am also testing QFlex (v1.0 branch) with the ./test_run_system/test.sh script, but there are a couple of errors occurred in testing the script. In particular, it failed to load the snapshot even if creating a snapshot was successful.

I am working on the images provided and at that time before I ran the test script, the images do not have any snapshots like below.

jeongseob@concerto:~/qflex/scripts ((v1.0))$ grep image user.cfg
set_variable KERNEL_PATH "$HOME/qflex/images/kernel"
set_variable IMG_0 "$HOME/qflex/images/debian-memcached/debian.qcow2"
set_variable IMG_1 "$HOME/qflex/images/debian-blank/debian.qcow2"
jeongseob@concerto:~/qflex/scripts ((v1.0))$ qemu-img snapshot -l ../images/debian-memcached/debian.qcow2
jeongseob@concerto:~/qflex/scripts ((v1.0))$ qemu-img snapshot -l ../images/debian-blank/debian.qcow2

I got the following error failing to delete the snapshot through the test script. The root cause of the error seems like my QEMU cannot SSH, but I don't understand why I need to have SSH to delete the snapshot of the image.

Please let me know If I am supposed to do something to enable SSH.

jeongseob@concerto:~/qflex/scripts ((v1.0))$ ./test_run_system/test.sh

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/single_save

Running Single Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/single_save -ow -sn=test_snap
Booting... Please wait

send: spawn id exp4 not open
    while executing
"send "echo Exiting Test\r""
Running Commands
Finished Commands

Taking Snapshot test_snap
Snapshot Saved

Killing all QEMU instances
[  PASSED  ] Booting Default
[  PASSED  ] Saving Snapshot

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/single_load

Running Single Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/single_load -ow -lo=test_snap -rs -sn=test_snap
Booting... Please wait

/home/jeongseob/qflex/scripts/test_run_system/../run_system.sh:107: QEMU Runtime ERROR cannot SSH[  FAILED  ] Loading Snapshot
[  FAILED  ] Deleting Snapshot

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/multiple_ping

*** Removing Existing Taps and Bridges
Restarting networking (via systemctl): networking.service.
*** Creating New Taps and Bridges
Restarting networking (via systemctl): networking.service.
Linking Taps and Bridges...
Taps and Bridges linked successfully

Running Multiple Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/multiple_ping -mult -ow --no_ns3 -num=0
Booting... Please wait

/home/jeongseob/qflex/scripts/test_run_system/../run_system.sh:107: QEMU Runtime ERROR cannot SSH[  FAILED  ] Configuration Multiple Instance 0
[  FAILED  ] Configuration Multiple Instance 1
[  FAILED  ] Booting Multiple Instance 0
[  FAILED  ] Booting Multiple Instance 1
[  FAILED  ] PING through NS3

Summary:
[  PASSED  ] Booting Default
[  PASSED  ] Saving Snapshot
[  FAILED  ] Loading Snapshot
[  FAILED  ] Deleting Snapshot
[  FAILED  ] Configuration Multiple Instance 0
[  FAILED  ] Configuration Multiple Instance 1
[  FAILED  ] Booting Multiple Instance 0
[  FAILED  ] Booting Multiple Instance 1
[  FAILED  ] PING through NS3
ustiugov commented 6 years ago

Hi Jeongseob,

The scripts boot a Qemu instance in background (non-interactive mode), hence, after booting, we need to check that the machine booted successfully by ssh-ing to it and executing "echo" command inside the guest.

To do so we run the following bash script: https://github.com/parsa-epfl/qflex/blob/master/scripts/run_system.sh#L98 We implemented this using Expect language: https://github.com/parsa-epfl/qflex/blob/master/scripts/helpers/ssh_test.sh

In your case, spawn ssh finishes with error:

    while executing
"send "echo Exiting Test\r""

To investigate the issue (the root cause must be in your host machine setup), I suggest booting a new Qemu instance with commands from the log (without --kill to keep the instance running in an interactive mode): /home/jeongseob/qflex/scripts/run_instance.sh -exp=test_run_system/results/single_save

and try to ssh to the guest from a different terminal on the same host (use ssh cloudsuite@localhost -p 2220)

Regards, Dmitrii

Jeongseob commented 6 years ago

Hi Dmitrii,

I found a reason why span ssh finished with error is that the image has not been logged in at least once. So, it requires the ssh steps like below.

jeongseob@concerto:~$ ssh cloudsuite@localhost -p 2220
The authenticity of host '[localhost]:2220 ([127.0.0.1]:2220)' can't be established.
ECDSA key fingerprint is SHA256:AhrM8tii6V79nWvS8a9xw5bI2LYB5OsE1iXbTo3Ey2w.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:2220' (ECDSA) to the list of known hosts.
cloudsuite@localhost's password:

After adding the fingerprint, I have not gotten the spawn ssh error. However, still, I got the following QEMU runtime error.

jeongseob@concerto:~/qflex/scripts ((v1.0))$ ./test_run_system/test.sh                                                                                                 [1174/1705]
[sudo] password for jeongseob:

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/single_save

Running Single Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/single_save -ow -sn=test_snap
Booting... Please wait

Running Commands
Finished Commands

Taking Snapshot test_snap
Snapshot Saved

Killing all QEMU instances
[  PASSED  ] Booting Default
[  PASSED  ] Saving Snapshot

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/single_load

Running Single Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/single_load -ow -lo=test_snap -rs -sn=test_snap
Booting... Please wait

/home/jeongseob/qflex/scripts/test_run_system/../run_system.sh:107: QEMU Runtime ERROR cannot SSH[  FAILED  ] Loading Snapshot
[  FAILED  ] Deleting Snapshot

 *** Running run_system.sh ***
Creating folder /home/jeongseob/qflex/scripts/test_run_system/results/multiple_ping

*** Removing Existing Taps and Bridges
Restarting networking (via systemctl): networking.service.
*** Creating New Taps and Bridges
Restarting networking (via systemctl): networking.service.
Linking Taps and Bridges...
Taps and Bridges linked successfully

Running Multiple Instance Mode : Port 2220
/home/jeongseob/qflex/scripts/run_instance.sh --kill -exp=test_run_system/results/multiple_ping -mult -ow --no_ns3 -num=0
Booting... Please wait

/home/jeongseob/qflex/scripts/test_run_system/../run_system.sh:107: QEMU Runtime ERROR cannot SSH[  FAILED  ] Configuration Multiple Instance 0
[  FAILED  ] Configuration Multiple Instance 1
[  FAILED  ] Booting Multiple Instance 0
[  FAILED  ] Booting Multiple Instance 1
[  FAILED  ] PING through NS3

Summary:
[  PASSED  ] Booting Default
[  PASSED  ] Saving Snapshot
[  FAILED  ] Loading Snapshot
[  FAILED  ] Deleting Snapshot
[  FAILED  ] Configuration Multiple Instance 0
[  FAILED  ] Configuration Multiple Instance 1
[  FAILED  ] Booting Multiple Instance 0
[  FAILED  ] Booting Multiple Instance 1
[  FAILED  ] PING through NS3
ustiugov commented 6 years ago

This is weird since we specify the options to get around the fingerprint: spawn ssh $PASSWORD@$HOST -p$PORT -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no

I suggest trying save/load snapshot by hand, and see if it works.

Regards, Dmitrii

Jeongseob commented 6 years ago

As I mentioned in #6, I am able to save and load the snapshot manually as mentioned in https://parsa-epfl.github.io/qflex//pages/download/. But I don't know why the test has been failed.

ustiugov commented 6 years ago

Ok, let's leave this issue open for some time, so that we can check it later.

neo-apz commented 4 years ago

We've released a new version of QFlex. You can give it a try and let us know if you see any issues. I'll close this issue for now. @Jeongseob