JET Lesson Stage 4 issue

Mierdin commented 5 years ago

Looking at https://ptr.labs.networkreliability.engineering/labs/?lessonId=25&lessonStage=4, it seems the ping command is meant to ping a local IP address, but there are no responses to this yet.

Might be worth looking into

/cc @jnpr-raylam @valjeanchan

Mierdin commented 5 years ago

The pings in stage 5 are also not working.

jnpr-raylam commented 5 years ago

There is some problem in the vQFX image, although we can ping the PFE:

antidote@vqfx> ping 169.254.0.1               
PING 169.254.0.1 (169.254.0.1): 56 data bytes
64 bytes from 169.254.0.1: icmp_seq=0 ttl=64 time=1.877 ms
64 bytes from 169.254.0.1: icmp_seq=1 ttl=64 time=2.022 ms
^C
--- 169.254.0.1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.877/1.950/2.022/0.072 ms

But we can't see the PFE:

{master:0}antidote@vqfx> show chassis fpc pic-status

So we don't have any xe-0/0/* interfaces:

{master:0}antidote@vqfx> show interfaces terse xe*

As you can see the configuration, all the IP addresses are configured in xe-0/0/x, that's why we can't ping it.

Here is the output of my vqfx-full image:

{master:0}
root@vqfx> show chassis fpc pic-status 
Slot 0   Online       QFX10002-36Q                                  
  PIC 0  Online       48x 10G-SFP+

{master:0}
root@vqfx> show interfaces terse xe* 
Interface               Admin Link Proto    Local                 Remote
xe-0/0/0                up    up
xe-0/0/0.16386          up    up  
xe-0/0/1                up    up
xe-0/0/1.16386          up    up

Mierdin commented 5 years ago

Hmm. @mwiget may have thoughts. I'm signing off for now, will be back online tomorrow am.

On Wed, Mar 27, 2019, 12:49 AM Raymond Lam notifications@github.com wrote:

There is some problem in the vQFX image, although we can ping the PFE:

antidote@vqfx> ping 169.254.0.1 PING 169.254.0.1 (169.254.0.1): 56 data bytes 64 bytes from 169.254.0.1: icmp_seq=0 ttl=64 time=1.877 ms 64 bytes from 169.254.0.1: icmp_seq=1 ttl=64 time=2.022 ms ^C --- 169.254.0.1 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 1.877/1.950/2.022/0.072 ms

But we can't see the PFE:

{master:0}antidote@vqfx> show chassis fpc pic-status

So we don't have any xe-0/0/* interfaces:

{master:0}antidote@vqfx> show interfaces terse xe*

As you can see the configuration, all the IP addresses are configured in xe-0/0/x, that's why we can't ping it.

Here is the output of my vqfx-full image:

{master:0} root@vqfx> show chassis fpc pic-status Slot 0 Online QFX10002-36Q PIC 0 Online 48x 10G-SFP+

{master:0} root@vqfx> show interfaces terse xe* Interface Admin Link Proto Local Remote xe-0/0/0 up up xe-0/0/0.16386 up up xe-0/0/1 up up xe-0/0/1.16386 up up

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nre-learning/antidote/issues/212#issuecomment-477020967, or mute the thread https://github.com/notifications/unsubscribe-auth/AECM-4UT0KYjXKUm5ilOgA6_lCo3F5hKks5vayKhgaJpZM4cNIVj .

jnpr-raylam commented 5 years ago

Further testing the container image, it boots up the Junos when the container starts, so it requires very long time to detect the PFE. After wait for about 10 minutes, I can see the PFE in show chassis fpc pic-status output. Despite of that, I am still unable to ping the self IP in another VR. Checked the interface counters, there are output packets in xe-0/0/0 but no input packets in xe-0/0/1, maybe need to debug the tap interface in the hypervisor.

FYI, for my vqfx-full image, I use the previous approach, wait for the vqfx fully boot up with PFE online, and then do a snapshot. The container image just simply boots the snapshot image, so the PFE goes online once the container started.

mwiget commented 5 years ago

Regarding >10 minutes, I wonder if kvm nested support is still not available. Cold booting 4 vQFX 18.4 on a i7-8700 takes less than 80 seconds, including LLDP neighbors over xe interfaces. As cosim runs natively in the container, it is just waiting for the connection from Junos and won’t add additional delays. I’ll need to bring up a sample lab myself on the cloud platform to get a sample tops going. So far I’ve done it locally only. Unfortunately I’m busy on other topics until Thu afternoon.

Mierdin commented 5 years ago

I'm a little confused as to why we're expecting traffic to leave xe-0/0/0 and re-enter xe-0/0/1? Wouldn't we need to bridge those together on the hypervisor to get that to work?

jnpr-raylam commented 5 years ago

ah...my bad. I forgot in my vqfx-full image, I have created a linux bridge to connect xe-0/0/0 & xe-0/0/1, and another bridge to connect xe-0/0/2 & xe-0/0/3

any suggestion for this scenario.... or we should bring up 2 vQFX to testing the ping?

Mierdin commented 5 years ago

Re: HW virtualization, we're running in GCP with the appropriate flags enabled to ensure this is passed through to the instance. We can see this makes it all the way to the container/pod:

kubectl exec -it -n=25-e9zw2gzoq1ujhup0-ns vqfx /bin/bash                                                                                                                                            

root@vqfx:/# grep --color vmx /proc/cpuinfo
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch tpr_shadow flexpriority ept fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat arch_capabilities
(....repeats per core...)

Then we can see the --enable-kvm flag is present in the qemu command:

root@vqfx:/# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1 79.9  3.2 2762396 2011420 ?     Ssl  17:56  15:50 qemu-system-x86_64 -M pc --enable-kvm -cpu host -smp 1 -m 2048 -no-user-config -no-shutdown -monitor tcp:0.0.0.0:4000,server,nowait -serial telnet
root       118  0.0  0.0  18364  1688 ?        S    17:56   0:00 /bin/bash /root/pecosim/pecosim_autorun.sh
root       125  1.8  0.4 8544732 295240 ?      S    17:56   0:21 /root/pecosim/pe_cosim -e -t inet -p 3000
root       195  0.0  0.0  18496  2024 ?        Ss   18:11   0:00 /bin/bash
root       215  0.0  0.0  34388  1468 ?        R+   18:16   0:00 ps -aux

It's entirely possible that there's some other form of CPU contention going on - in fact, running top even after the vqfx has booted is showing 100% utilization for a while, presumably because the cosim is doing something in that time (since it doesn't show up for a few minutes even after that).

Mierdin commented 5 years ago

@jnpr-raylam You can do that, or even just another linux container will work, provided you set an ip address to the appropriate interface within the lesson guide.

I am more than likely going to keep the JET lesson and the OpenConfig lesson in PTR while we sort these images issues out. We have a lot of other stuff that needs to get published ASAP so we can move forward with those, and then once we get these images issues sorted, I will circle back and promote these lessons in a quick release once again.

bakenekonote commented 5 years ago

What about using LT interface ?

Regards Tony Chan

On Mar 28, 2019, at 05:23, Matt Oswalt notifications@github.com wrote:

@jnpr-raylam You can do that, or even just another linux container will work, provided you set an ip address to the appropriate interface within the lesson guide.

I am more than likely going to keep the JET lesson and the OpenConfig lesson in PTR while we sort these images issues out. We have a lot of other stuff that needs to get published ASAP so we can move forward with those, and then once we get these images issues sorted, I will circle back and promote these lessons in a quick release once again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jnpr-raylam commented 5 years ago

vQFX doesn't support tunnel interface

nre-learning / nrelabs-curriculum

JET Lesson Stage 4 issue #212