Open gord1306 opened 4 years ago
@gord1306
Has anyone ever encounter this issue.
There was a similar issue: sometimes the VM can randomly hang due to kernel crash (something related to spinlocks).
@nazariig may I have your hardware info for reference, e.g. cpu model, memory, disk size, and how many veos you launch, thank you. I am trying to reduce the number of VMs now. And hope this change can reduce the issue
@gord1306 we are aligned with the official technical requirements: https://github.com/Azure/sonic-mgmt/blob/master/ansible/doc/README.testbed.Overview.md#testbed-server
maybe try cEOS which is much lightweighted than vEOS. caution, still experimental, not all tests have been tested on the cEOS platform.
@lguohan, May I consult with you about the recovery mechanism. Did you have encountered this issue? If yes, how you do recover during performing the testing. The current way I can do for recovery is to use virsh reset
to reset the VM and perform add topology again.
yes, that is also my recovery method.
@lguohan, I checked the arista.xml.j2, and there is a question about the disk cache mode which is set to writeback and is never changed. But the kvm default value seems to be writethrough, have you tested in writethrough mode before?
no. I haven't.
@gord1306 we have found that increasing VM RAM up to 2.5/3 GB per instance has a positive influence on restart/topo change sequence.
Description For t1, t1-lag topologies which use much more Arista EOS VMs, it has high possibility randomly to encounter that Arista EOS can not complete the initialization during add topology. Has anyone ever encounter this issue. It looks like aaa service hang.
It stay in this state more than 1 hours. Sometimes it will success to login prompt, sometimes not
If I directly press ESC to skip, it may show aaa no responding
And the message is regarding the Aaa service.