Open donnlee opened 7 years ago
Reattempted goes-platina-mk1-installer
and it failed in the same way.
Did a COLD reboot and this time goes started ok.
donn@invader7:~$ uptime
13:46:06 up 0 min, 1 user, load average: 0.30, 0.07, 0.02
donn@invader7:~$ uname -a
Linux invader7 4.11.0-platina-mk1-amd64 #2 SMP Fri Jun 9 11:21:14 PDT 2017 x86_64 GNU/Linux
donn@invader7:~$ sudo goes status
[sudo] password for donn:
GOES status
======================
PCI - OK
Check daemons - OK
Check Redis - OK
Check vnet - OK
Same symptoms, same thing happened when I upgraded invader2.
Hi Donn, In your upgrade process is a reboot involved? If so, do you execute it with “reboot” or “reboot -f”? Also after the reboot if you do “lspci” does the TH 04:00.0 and 04:00.1 devices show up?
On alpha units (invader 1-15), please do “reboot -f” to make sure TH reliably shows up in lspci after a reboot.
thanks Jason
Try doing a "goes stop; rmmod uio-pci-dma" before the install of the vfio mode. Check if you have /etc/modprobe.d/goes-platina-mk1-modprobe.confi that's loading the module.
stig
On Thu, Jul 27, 2017 at 2:18 PM, Donn Lee notifications@github.com wrote:
Same symptoms, same thing happened when I upgraded invader2.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/platinasystems/go/issues/64#issuecomment-318489261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQZoA3F5VDN1IAfT2XnPoAPqQxvvktOks5sSP6UgaJpZM4Olz7_ .
I upgraded coreboot (per Jason's email). Then I did 'reboot -f' and saw a scary looking crash (below). Going to try another cold boot next.
Last login: Thu Jul 27 14:31:50 PDT 2017 from 172.16.2.23 on pts/0
Linux invader2 4.11.0-platina-mk1-amd64 #1 SMP Thu May 11 22:06:03 PDT 2017 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
donn@invader2:~$ sudo reboot -f
[sudo] password for donn:
Rebooting.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
sd 0:0:0:0: [sda] Synchronizing SCSI cache
IP: napi_hash_del+0x14/0x70
PGD 440fc0067
PUD 45edc5067
PMD 0
Oops: 0002 [#1] SMP
Modules linked in: xt_nat ixgbevf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype ipta
ble_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay iptable_raw nls_utf8 nls_cp437 vfat fat kvm_intel kvm uio_pci_dma
i2c_i801 autofs4 dm_mod ixgbe mdio
CPU: 5 PID: 8064 Comm: goes Not tainted 4.11.0-platina-mk1-amd64 #1
Hardware name: Intel Camelback Mountain Platina DC/Camelback Mountain Platina DC, BIOS coreboot-unknown 07/27/2017
task: ffff88046b162340 task.stack: ffffc90006c70000
RIP: 0010:napi_hash_del+0x14/0x70
RSP: 0018:ffffc90006c73b98 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff88045efb84f0 RDI: ffffffff819a674c
RBP: ffffc90006c73ba0 R08: 0000000000000002 R09: 0000000000000000
R10: ffffc90006c73b60 R11: ffff88046b13de00 R12: 0000000000000001
R13: ffff88045efb8800 R14: 0000000000000000 R15: 0000000000000010
FS: 00007fa802ffd700(0000) GS:ffff88047fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 000000045f19a000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ixgbevf_free_q_vectors+0x45/0x70 [ixgbevf]
ixgbevf_clear_interrupt_scheme+0x9b/0xc0 [ixgbevf]
ixgbevf_remove+0x44/0xb0 [ixgbevf]
pci_device_remove+0x34/0xb0
device_release_driver_internal+0x142/0x1f0
device_release_driver+0xd/0x10
pci_stop_bus_device+0x6b/0x80
pci_stop_and_remove_bus_device+0xd/0x20
pci_iov_remove_virtfn+0x9b/0x130
? pci_get_subsys+0x30/0x40
pci_disable_sriov+0x37/0x110
ixgbe_disable_sriov+0xc5/0x210 [ixgbe]
ixgbe_pci_sriov_configure+0xeb/0x140 [ixgbe]
sriov_numvfs_store+0x13f/0x190
dev_attr_store+0x13/0x20
sysfs_kf_write+0x32/0x40
kernfs_fop_write+0x102/0x180
__vfs_write+0x23/0x120
? __alloc_fd+0x3a/0x160
vfs_write+0xaf/0x180
SyS_write+0x41/0xb0
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x4885e4
RSP: 002b:000000c4210f9630 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004885e4
RDX: 0000000000000002 RSI: 000000c420282280 RDI: 0000000000000007
RBP: 000000c4210f9788 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000126 R14: 00000000000010f0 R15: 0000000000000200
Code: 89 53 20 48 89 08 48 89 4a 08 c6 05 46 2a 58 00 00 5b 5d c3 0f 1f 00 55 48 89 e5 53 48 89 fb 48 c7 c7 4c 67 9a 81 e8 cc 30 0d 00 <f0> 0f ba
73 10 04 72 0c 31 c0 c6 05 17 2a 58 00 00 5b 5d c3 48
RIP: napi_hash_del+0x14/0x70 RSP: ffffc90006c73b98
CR2: 0000000000000020
---[ end trace d0b781331bdbac25 ]---
INFO: rcu_sched self-detected stall on CPU
0-...: (5249 ticks this GP) idle=54d/140000000000001/0 softirq=15739/15739 fqs=2624
(t=5250 jiffies g=3807 c=3806 q=993)
NMI backtrace for cpu 0
CPU: 0 PID: 10974 Comm: reboot Tainted: G D 4.11.0-platina-mk1-amd64 #1
Hardware name: Intel Camelback Mountain Platina DC/Camelback Mountain Platina DC, BIOS coreboot-unknown 07/27/2017
Call Trace:
<IRQ>
dump_stack+0x4d/0x65
nmi_cpu_backtrace+0x9b/0xa0
? irq_force_complete_move+0xf0/0xf0
nmi_trigger_cpumask_backtrace+0x8f/0xc0
arch_trigger_cpumask_backtrace+0x14/0x20
rcu_dump_cpu_stacks+0x8f/0xca
rcu_check_callbacks+0x651/0x7b0
? update_wall_time+0x448/0x770
update_process_times+0x2a/0x50
tick_sched_timer+0x48/0x160
__hrtimer_run_queues+0x9c/0x110
hrtimer_interrupt+0xa3/0x190
local_apic_timer_interrupt+0x33/0x60
smp_apic_timer_interrupt+0x33/0x50
apic_timer_interrupt+0x86/0x90
RIP: 0010:queued_spin_lock_slowpath+0x15d/0x180
RSP: 0018:ffffc9000a37fcc0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000101 RBX: ffff88046b6d9050 RCX: 0000000000000101
RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffffff819a674c
RBP: ffffc9000a37fcc0 R08: 0000000000000001 R09: 00000000000ffff8
R10: ffffc9000a37fc48 R11: 0000000000000004 R12: ffff88046d27e800
R13: ffff88046d1c3000 R14: 0000000000000002 R15: ffffc9000a37fd87
</IRQ>
_raw_spin_lock+0x1b/0x20
napi_hash_del+0x14/0x70
netif_napi_del+0xd/0x70
igb_reset_q_vector+0x4f/0x60
igb_free_q_vectors+0x3d/0x80
__igb_shutdown+0x5f/0x1d0
igb_shutdown+0x17/0x50
pci_device_shutdown+0x31/0x70
device_shutdown+0xc9/0x180
kernel_restart_prepare+0x31/0x40
kernel_restart+0xd/0x60
SyS_reboot+0xf4/0x1d0
? kmem_cache_alloc+0xf9/0x110
? __alloc_fd+0x3a/0x160
? vfs_writev+0x37/0x50
? __fdget_pos+0x12/0x50
? vfs_writev+0x37/0x50
? do_writev+0x49/0xb0
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x7fad98183b46
RSP: 002b:00007ffeb465db78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
RAX: ffffffffffffffda RBX: 00007ffeb465d640 RCX: 00007fad98183b46
RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
RBP: 00007ffeb465d8b0 R08: 00007ffeb465d250 R09: 00007ffeb465daa0
R10: 0000000000000002 R11: 0000000000000206 R12: 0000563899513742
R13: 00007ffeb465d7b8 R14: 0000000000000001 R15: 0000000000000014
INFO: rcu_sched self-detected stall on CPU
0-...: (20946 ticks this GP) idle=54d/140000000000001/0 softirq=15739/15739 fqs=10463
(t=21003 jiffies g=3807 c=3806 q=3251)
NMI backtrace for cpu 0
CPU: 0 PID: 10974 Comm: reboot Tainted: G D 4.11.0-platina-mk1-amd64 #1
Hardware name: Intel Camelback Mountain Platina DC/Camelback Mountain Platina DC, BIOS coreboot-unknown 07/27/2017
Call Trace:
<repeats>
Just commenting on the original headline - this is an error that indicates another instance of vnet has vfio opened so the second instance cannot start and panics.
After installing latest goes-installer,
goes status
fails and panic seen in syslog. Previous goes version was built on 2017/07/13.