rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

Cannot ssh into v0.5.0 instance on AWS #1090

Closed pulberg closed 8 years ago

pulberg commented 8 years ago

RancherOS Version: (ros os version) v0.5.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) AWS

I noticed that after upgrading from v0.4.5 to v0.5.0 I could not ssh into the host, all attempts failed with a “connection refused” message –

image

I had to reboot the host from the AWS console, after it came up I was able to ssh again. This was not a 1 time incident, this has now happened to 4 hosts being upgraded consecutively.

Upgrading from RancherOS v0.4.5, AMI - ami-812ec0ec

Command used to upgrade host - sudo ros os upgrade

michaellopez commented 8 years ago

This is happening to us too. 0.4.2 and 0.4.5 upgrades 0.5.0 on 4 hosts, all the same, cannot SSH into the machines anymore (Permission denied (publickey,password,keyboard-interactive).). My guess is that the important note in the release notes

Upgrading from v0.4.x will destroy your existing console.

Basically kills the ~/.ssh folder inside RancherOS that has your public ssh key. How do we recover from this? Also, using local VM's here and reboot does not help us gain SSH access again :(

We've tried with interactive password too, but that does not work either.

michaellopez commented 8 years ago

Having looked into this, I believe our error is due to the fact that RancherOS now somewhere does a chown -R 1100:1100 on /home/rancher/.ssh upon booting up. We've changed the uid on the rancher user via sudo usermod after installation, so chowning that folder to 1100 is not good for us. Is there a supported way to change the uid of rancher?

michaellopez commented 8 years ago

We managed to move away from changing the uid of rancher in ros. I'd still like to access the machine where this error is happening. Anyway I can revert the uid or enable password-login for rancher or root? I'm poking around the disk with gparted live cd, but since it's all containers, I'm not too sure I can fix it from there :) any help is appreciated

michaellopez commented 8 years ago

I managed to use the docker user in ros to gain access to my installation again: ssh docker@ros-host.

Then ran sudo chown -R rancher:rancher /home/rancher, logout and log back in as rancher. Now it works again.

deniseschannon commented 8 years ago

@pulberg I have been unable to reproduce. I have tried upgrading from v0.4.5 in the default console and in the ubuntu console. With both times, after a couple of minutes, I was able to ssh in without any issues.

Did you make any changes like @michaellopez in terms of user? Could you provide me the results of sudo ros config export of your instance so I could see what else you have set? Please hide/change any sensitive information.

pulberg commented 8 years ago

@deniseschannon I don't have any customizations or changes to the AMI, here is the config export -

hostname: D-RHST01
rancher:
  cloud_init:
    datasources:
    - ec2
  environment:
    RESIZE_DEV: /dev/xvda
  services_include:
    resize-fs: true
deniseschannon commented 8 years ago

@pulberg When you started the AMI, did you pass in any cloud-config under user data or did you only use the key pair through AWS?

pulberg commented 8 years ago

@deniseschannon I don't pass in any cloud config, just use the key pair through AWS.

deniseschannon commented 8 years ago

@pulberg I was finally able to get a box to reproduce this issue! :) But it took many attempts and the steps were exactly the same as my other boxes that never hit this issue.

We need to look further into it.

michaellopez commented 8 years ago

@deniseschannon I just love how you and your colleagues never give up making your products better. It is very motivating to use your products knowing that they are backed by a fantastic team. Thank you so much and keep up the exemplary work! You are an inspiration. Here, have some cake 🍰

deniseschannon commented 8 years ago

This seems to be some condition where for some reason the kernel gets stuck in booting.

I have only been able to hit this issue once out of 20-30 times. The workaround would be to manually reboot the host.

deniseschannon commented 8 years ago

Actually, after reviewing the "Get Instance Screenshot" will end up showing "Booting the kernel." even if the kernel has booted.

We need to capture the "Get System Log" for when this occurs.

Note: We have had users report of having the issue, but have yet to consistently reproduce. We will keep trying to investigate.

bdentino commented 8 years ago

This is happening to me relatively frequently running in us-west-2 on m3.medium instance types. I checked the system log for the most recent failure and this is what I got:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.10-rancher (root@bd493e3edf03) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #1 SMP Fri Jun 17 17:16:24 UTC 2016 ()
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-v0.5.0-rancheros console=ttyS0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.4 present.
[    0.000000] Hypervisor detected: Xen
[    0.000000] Xen version 4.2.
[    0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs.
[    0.000000] Blkfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated disks.
[    0.000000] You might have to change the root device
[    0.000000] from /dev/hd[a-d] to /dev/xvd[a-d]
[    0.000000] in your root= kernel command line option
[    0.000000] e820: last_pfn = 0xf0000 max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[    0.000000] RAMDISK: [mem 0x33912000-0x35c80fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000EA020 000024 (v02 Xen   )
[    0.000000] ACPI: XSDT 0x00000000FC00F5A0 000054 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: FACP 0x00000000FC00F260 0000F4 (v04 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: DSDT 0x00000000FC0035E0 00BBF6 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] ACPI: FACS 0x00000000FC0035A0 000040
[    0.000000] ACPI: FACS 0x00000000FC0035A0 000040
[    0.000000] ACPI: APIC 0x00000000FC00F360 0000D8 (v02 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: HPET 0x00000000FC00F4B0 000038 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: WAET 0x00000000FC00F4F0 000028 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: SSDT 0x00000000FC00F520 000031 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] ACPI: SSDT 0x00000000FC00F560 000031 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x00000000efffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0xefff8000-0xefffdfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000efffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009dfff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000000efffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000000efffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 15 CPUs, 14 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen HVM
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:15 nr_node_ids:1
[    0.000000] PERCPU: Embedded 33 pages/cpu @ffff8800eb600000 s96984 r8192 d29992 u262144
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 967560
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-v0.5.0-rancheros console=ttyS0
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 3816144K/3931764K available (6233K kernel code, 989K rwdata, 3408K rodata, 1288K init, 920K bss, 115620K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=15, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=15.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=15
[    0.000000] NR_IRQS:8448 nr_irqs:952 16
[    0.000000] xen:events: Using 2-level ABI
[    0.000000] xen:events: Xen HVM callback vector for event delivery is enabled
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
[    0.000000] console [ttyS0] enabled
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns
[    0.000000] tsc: Detected 2500.032 MHz processor
[    0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.06 BogoMIPS (lpj=10000128)
[    0.009816] pid_max: default: 32768 minimum: 301
[    0.012008] ACPI: Core revision 20150930
[    0.022867] ACPI: 3 ACPI AML tables successfully acquired and loaded
[    0.027408] Security Framework initialized
[    0.028024] AppArmor: AppArmor initialized
[    0.032259] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.036867] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.040379] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
[    0.044009] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
[    0.048232] Initializing cgroup subsys io
[    0.052007] Initializing cgroup subsys memory
[    0.056010] Initializing cgroup subsys devices
[    0.059049] Initializing cgroup subsys freezer
[    0.060004] Initializing cgroup subsys net_cls
[    0.064004] Initializing cgroup subsys perf_event
[    0.068010] Initializing cgroup subsys net_prio
[    0.071529] Initializing cgroup subsys pids
[    0.072058] CPU: Physical Processor ID: 0
[    0.076736] mce: CPU supports 2 MCE banks
[    0.080025] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    0.084003] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0, 1GB 4
[    0.097922] ftrace: allocating 30707 entries in 120 pages
[    0.156099] smpboot: Max logical packages: 15
[    0.160005] smpboot: APIC(0) Converting physical 0 to logical package 0
[    0.164695] x2apic: IRQ remapping doesn't support X2APIC mode
[    0.168003] Switched APIC routing to physical flat.
[    0.174211] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[    0.249022] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.252016] installing Xen timer for CPU 0
[    0.256060] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (family: 0x6, model: 0x3e, stepping: 0x4)
[    0.261060] cpu 0 spinlock event irq 53
[    0.263484] Performance Events: unsupported p6 CPU model 62 no PMU driver, software events only.
[    0.268279] x86: Booted up 1 node, 1 CPUs
[    0.270926] smpboot: Total of 1 processors activated (5000.06 BogoMIPS)
[    0.272020] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.276006] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.280364] devtmpfs: initialized
[    0.286729] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.288101] pinctrl core: initialized pinctrl subsystem
[    0.291556] NET: Registered protocol family 16
[    0.292153] cpuidle: using governor ladder
[    0.294757] cpuidle: using governor menu
[    0.296068] ACPI: bus type PCI registered
[    0.298688] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.300057] dca service started, version 1.12.1
[    0.303226] PCI: Using configuration type 1 for base access
[    0.305534] ACPI: Added _OSI(Module Device)
[    0.308005] ACPI: Added _OSI(Processor Device)
[    0.310786] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.312004] ACPI: Added _OSI(Processor Aggregator Device)
[    0.318979] ACPI: Interpreter enabled
[    0.320009] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20150930/hwxface-580)
[    0.325380] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150930/hwxface-580)
[    0.329392] ACPI: (supports S0 S3 S4 S5)
[    0.331861] ACPI: Using IOAPIC for interrupt routing
[    0.332023] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.386376] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.388012] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.392010] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.396013] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.400870] acpiphp: Slot [0] registered
[    0.404321] acpiphp: Slot [3] registered
[    0.407080] acpiphp: Slot [4] registered
[    0.408299] acpiphp: Slot [5] registered
[    0.411043] acpiphp: Slot [6] registered
[    0.412323] acpiphp: Slot [7] registered
[    0.415093] acpiphp: Slot [8] registered
[    0.416317] acpiphp: Slot [9] registered
[    0.419821] acpiphp: Slot [10] registered
[    0.420327] acpiphp: Slot [11] registered
[    0.423161] acpiphp: Slot [12] registered
[    0.424311] acpiphp: Slot [13] registered
[    0.427065] acpiphp: Slot [14] registered
[    0.428315] acpiphp: Slot [15] registered
[    0.431082] acpiphp: Slot [16] registered
[    0.432312] acpiphp: Slot [17] registered
[    0.435168] acpiphp: Slot [18] registered
[    0.436326] acpiphp: Slot [19] registered
[    0.439104] acpiphp: Slot [20] registered
[    0.440310] acpiphp: Slot [21] registered
[    0.443110] acpiphp: Slot [22] registered
[    0.444316] acpiphp: Slot [23] registered
[    0.447074] acpiphp: Slot [24] registered
[    0.448311] acpiphp: Slot [25] registered
[    0.451100] acpiphp: Slot [26] registered
[    0.452382] acpiphp: Slot [27] registered
[    0.455351] acpiphp: Slot [28] registered
[    0.456317] acpiphp: Slot [29] registered
[    0.459076] acpiphp: Slot [30] registered
[    0.460317] acpiphp: Slot [31] registered
[    0.463319] PCI host bridge to bus 0000:00
[    0.464008] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.468005] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.472004] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.476005] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfbffffff window]
[    0.480005] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.489585] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.492004] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.495932] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.496004] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.500548] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[    0.500548] * this clock source is slow. Consider trying other clock sources
[    0.505295] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.514891] ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11)
[    0.517298] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.520993] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.524517] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11)
[    0.546292] ACPI: Enabled 2 GPEs in block 00 to 0F
[    0.548045] xen:balloon: Initialising balloon driver
[    0.552035] xen_balloon: Initialising balloon driver
[    0.556248] vgaarb: setting as boot device: PCI:0000:00:02.0
[    0.559948] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.560006] vgaarb: loaded
[    0.561788] vgaarb: bridge control possible 0000:00:02.0
[    0.565005] SCSI subsystem initialized
[    0.567266] ACPI: bus type USB registered
[    0.568023] usbcore: registered new interface driver usbfs
[    0.570875] usbcore: registered new interface driver hub
[    0.572021] usbcore: registered new device driver usb
[    0.574047] PCI: Using ACPI for IRQ routing
[    0.576090] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.578855] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.580676] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[    0.584027] clocksource: Switched to clocksource xen
[    0.591468] AppArmor: AppArmor Filesystem Enabled
[    0.593498] pnp: PnP ACPI init
[    0.594749] system 00:00: [mem 0x00000000-0x0009ffff] could not be reserved
[    0.597992] system 00:01: [io  0x08a0-0x08a3] has been reserved
[    0.600391] system 00:01: [io  0x0cc0-0x0ccf] has been reserved
[    0.602714] system 00:01: [io  0x04d0-0x04d1] has been reserved
[    0.605313] system 00:07: [io  0x10c0-0x1141] has been reserved
[    0.607650] system 00:07: [io  0xb044-0xb047] has been reserved
[    0.625670] pnp: PnP ACPI: found 8 devices
[    0.633194] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.636756] NET: Registered protocol family 2
[    0.638632] TCP established hash table entries: 32768 (order: 6, 262144 bytes)
[    0.641542] TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
[    0.644492] TCP: Hash tables configured (established 32768 bind 32768)
[    0.647084] UDP hash table entries: 2048 (order: 4, 65536 bytes)
[    0.649480] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
[    0.652019] NET: Registered protocol family 1
[    0.653753] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.656094] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.658368] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.660953] Unpacking initramfs...
[   14.018069] Freeing initrd memory: 36284K (ffff880033912000 - ffff880035c81000)
[   14.021250] RAPL PMU detected, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
[   14.024643] hw unit of domain pp0-core 2^-16 Joules
[   14.026692] hw unit of domain package 2^-16 Joules
[   14.028788] hw unit of domain dram 2^-16 Joules
[   14.031059] futex hash table entries: 4096 (order: 6, 262144 bytes)
[   14.033820] audit: initializing netlink subsys (disabled)
[   14.035982] audit: type=2000 audit(1468265391.677:1): initialized
[   14.038697] Initialise system trusted keyring
[   14.040509] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[   14.044355] VFS: Disk quotas dquot_6.6.0
[   14.046083] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[   14.049150] fuse init (API version 7.23)
[   14.051079] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[   14.054128] io scheduler noop registered
[   14.055691] io scheduler deadline registered
[   14.057463] io scheduler cfq registered (default)
[   14.059516] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[   14.061701] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[   14.079762] Console: switching to colour frame buffer device 100x37
[   14.084417] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[   14.088092] ACPI: Power Button [PWRF]
[   14.089646] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[   14.092711] ACPI: Sleep Button [SLPF]
[   14.094455] GHES: HEST is not enabled!
[   14.096149] ioatdma: Intel(R) QuickData Technology Driver 4.00
[   14.100022] xen:grant_table: Grant tables using version 1 layout
[   14.103217] Grant table initialized
[   14.105125] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
[   14.108237] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[   14.153257] 00:06: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[   14.157784] Linux agpgart interface v0.103
[   14.159508] [drm] Initialized drm 1.1.0 20060810
[   14.161612] tun: Universal TUN/TAP device driver, 1.6
[   14.163649] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[   14.166169] VMware vmxnet3 virtual NIC driver - version 1.4.5.0-k-NAPI
[   14.168859] xen_netfront: Initialising Xen virtual ethernet driver
[   14.172310] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   14.175158] ehci-pci: EHCI PCI platform driver
[   14.177248] ehci-platform: EHCI generic platform driver
[   14.179549] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[   14.185554] serio: i8042 KBD port at 0x60,0x64 irq 1
[   14.188186] serio: i8042 AUX port at 0x60,0x64 irq 12
[   14.191433] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[   14.195658] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0
[   14.199485] rtc_cmos 00:02: alarms up to one day, 114 bytes nvram, hpet irqs
[   14.204367] sdhci: Secure Digital Host Controller Interface driver
[   14.208240] sdhci: Copyright(c) Pierre Ossman
[   14.210289] NET: Registered protocol family 10
[   14.213501] NET: Registered protocol family 17
[   14.215823] 9pnet: Installing 9P2000 support
[   14.218649] microcode: CPU0 sig=0x306e4, pf=0x1, revision=0x416
[   14.222314] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[   14.228093] registered taskstats version 1
[   14.230559] Loading compiled-in X.509 certificates
[   14.232987] zswap: default zpool zbud not available
[   14.235605] zswap: pool creation failed
[   14.237212] AppArmor: AppArmor sha1 policy hashing enabled
[   15.028112] tsc: Refined TSC clocksource calibration: 2500.000 MHz
[   15.030618] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x240939f1bb2, max_idle_ns: 440795263295 ns
[   19.336094] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...
[   44.336509] 
[   44.337399] xenbus_probe_frontend: Timeout connecting to device: device/vfb/0 (local state 3, remote state 1)
[   44.352582] xenbus_probe_frontend: Device with no driver: device/vbd/768
[   44.355472] xenbus_probe_frontend: Device with no driver: device/vbd/51728
[   44.358469] rtc_cmos 00:02: setting system clock to 2016-07-11 19:30:22 UTC (1468265422)
[   44.363491] Freeing unused kernel memory: 1288K (ffffffff81cf9000 - ffffffff81e3b000)
[   44.369938] Write protecting the kernel read-only data: 12288k
[   44.374682] Freeing unused kernel memory: 1948K (ffff880001619000 - ffff880001800000)
[   44.386249] Freeing unused kernel memory: 688K (ffff880001b54000 - ffff880001c00000)
INFO[0000] Launching Bootstrap Docker                   
INFO[0000] Waiting for Docker at unix:///var/run/system-docker.sock 
[   44.561588] random: system-docker urandom read with 76 bits of entropy available
INFO[0000] Connected to Docker at unix:///var/run/system-docker.sock 
INFO[0000] Loading images from /usr/share/ros/images.tar 
INFO[0002] Done loading images from /usr/share/ros/images.tar 
INFO[0002] Running Bootstrap services                   
INFO[0002] Project [bootstrap]: Starting project        
INFO[0002] [0/2] [udev-bootstrap]: Starting             
[   47.178040] cgroup: docker-runc (160) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[   47.184212] cgroup: "memory" requires setting use_hierarchy to 1 on the root
[   47.242314] udevd[6]: starting version 3.1.5
[   47.260093] random: nonblocking pool is initialized
[   47.301760] FUJITSU Extended Socket Network Device Driver - version 1.0 - Copyright (c) 2015 FUJITSU LIMITED
[   47.368851] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
[   47.373159] AVX version of gcm_enc/dec engaged.
[   47.375038] AES CTR mode by8 optimization enabled
[   47.397519] scsi host0: ata_piix
[   47.405384] scsi host1: ata_piix
[   47.419348] mousedev: PS/2 mouse device common for all mice
[   47.425499] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14
[   47.429723] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15
[   47.434930] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
[   47.458993] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[   47.476840]  xvda: xvda1
[   47.499219] blkfront: xvdb: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[   47.706159] udevd[23]: starting version 3.1.5
INFO[0003] [1/2] [udev-bootstrap]: Started              
INFO[0003] [1/2] [state-script]: Starting               
INFO[0004] [2/2] [state-script]: Started                
INFO[0004] Project [bootstrap]: Project started         
INFO[0004] Mounting state device /dev/xvda1 to /state   
[   48.597692] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
INFO[0005] Launching System Docker                      
INFO[0000] Loading images from /usr/share/ros/images.tar 
ERRO[0001] Failed [1/4] 25%                             
FATA[0001] Error response from daemon: unable to decode ApplyLayer JSON response: EOF
ibuildthecloud commented 8 years ago

Awesome.... looks like a Docker bug. We'll look into it. Thanks.

deniseschannon commented 8 years ago

Using v0.6.0-rc4, I launched 25 AMIs and was able to ssh into all instances.

till commented 6 years ago

I think I somehow ran into this as well, is there a way to recover the console?

First I did changed rancher-server to stable, followed the instruction for single node with bind volume. That part worked (including removing the old container).

Then I did a ros engine enable ... to switch to a previous docker version, then rebooted the server and since then I can no longer ssh in. The machine did come up though, all my services (rancher-server, etc.) are running and I can log into the UI, etc..

Setting up this machine again is not a big deal, but how do you do this when the computer is not next to you but in a datacenter. That's why I wanted to ask about recovering the console.

My initial cloud-init:

ssh_authorized_keys:
  - ssh-rsa ...== till@foo
hostname: box
rancher:
  console: alpine
  docker:
    engine: 17.12.1-ce
  network:
    dns:
      nameservers:
        - 8.8.4.4
        - 8.8.8.8