hyperhq / runv

Hypervisor-based Runtime for OCI
Apache License 2.0
828 stars 129 forks source link

hyperstart APIVersion failed #466

Closed gnawux closed 7 years ago

gnawux commented 7 years ago

Here:

06:58:38 I0401 06:58:39.264425    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.264438    7157 json.go:229] get command NEXT: send 116, receive 8
06:58:38 I0401 06:58:39.264448    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.264460    7157 json.go:229] get command NEXT: send 116, receive 116
06:58:38 I0401 06:58:39.264470    7157 json.go:208] got cmd:9
06:58:38 I0401 06:58:39.264609    7157 vm_console.go:46] SB[vm-QGyxKjgyXa] [CNL] hyper_modify_event modify event fd 3, 0x61d648, event 8193
06:58:38 I0401 06:58:39.264942    7157 exec.go:93] Wait with request container:"d9f81367f4ad974fb2ba3f60f38917602d28914a15764f01c117be082aad9e39" 
06:58:38 I0401 06:58:39.264963    7157 exec.go:19] Get Exec Code for container d9f81367f4ad974fb2ba3f60f38917602d28914a15764f01c117be082aad9e39
06:58:38 I0401 06:58:39.264998    7157 vm_console.go:46] SB[vm-QGyxKjgyXa] [CNL] pid 332 exit normally, status 0
06:58:38 I0401 06:58:39.267671    7157 vm_console.go:46] SB[vm-QGyxKjgyXa] [CNL] hyper_install_process_stdio
06:58:42 E0401 06:58:43.008096    7157 json.go:522] get hyperstart API version error: send ctl channel error, the hyperstart might have closed
06:58:47 I0401 06:58:48.656924    7157 json.go:208] got cmd:0
06:58:47 I0401 06:58:48.657010    7157 json.go:279] send command 0 to init, payload: 'null'.
06:58:47 I0401 06:58:48.657044    7157 json.go:291] write 12 to hyperstart.

cmd 0 is VERSION, however, it report API version error before it send command?

detailed logs: http://ci.hypercontainer.io:8080/job/hyperd-auto/288/consoleFull

gnawux commented 7 years ago

by this log (come from here)

06:58:38 I0401 06:58:38.716869    7157 json.go:165] hyperstart API version:4243, VM hyperstart API version: 4243

It shows h.APIVersion() has returned successfully. However, by this log (come from here)

06:58:42 E0401 06:58:43.008096    7157 json.go:522] get hyperstart API version error: send ctl channel error, the hyperstart might have closed

It shows APIVersion() failed 4 seconds later than it success...

The failed one looks come from here: https://github.com/hyperhq/runv/blob/07f6afdfbdb4b361fb1161fbdea7c1b242845cfa/hypervisor/hypervisor.go#L55

gnawux commented 7 years ago

Let's track the commands

06:58:37 I0401 06:58:38.638828    7157 json.go:208] got cmd:0
06:58:37 I0401 06:58:38.638927    7157 json.go:291] write 12 to hyperstart.
06:58:37 I0401 06:58:38.638944    7157 json.go:208] got cmd:1
06:58:37 I0401 06:58:38.638990    7157 json.go:208] got cmd:0
06:58:37 I0401 06:58:38.656617    7157 json.go:291] write 92 to hyperstart.
06:58:38 I0401 06:58:38.724851    7157 json.go:208] got cmd:20
06:58:38 I0401 06:58:38.724886    7157 json.go:291] write 79 to hyperstart.
[...]

looks the second command 0 (INIT_VERSION) was not sent to hyperstart, right? @laijs @bergwolf

gnawux commented 7 years ago

hmm, no

cmd 0:

06:58:37 I0401 06:58:38.656569    7157 json.go:208] got cmd:14
06:58:37 I0401 06:58:38.656579    7157 json.go:229] get command NEXT: send 12, receive 8
06:58:37 I0401 06:58:38.656586    7157 json.go:208] got cmd:14
06:58:37 I0401 06:58:38.656594    7157 json.go:229] get command NEXT: send 12, receive 12
06:58:37 I0401 06:58:38.656617    7157 json.go:291] write 92 to hyperstart.
06:58:37 I0401 06:58:38.656625    7157 json.go:208] got cmd:9

cmd 1:

06:58:38 I0401 06:58:38.716813    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:38.716819    7157 json.go:229] get command NEXT: send 92, receive 8
06:58:38 I0401 06:58:38.716824    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:38.716829    7157 json.go:229] get command NEXT: send 92, receive 80
06:58:38 I0401 06:58:38.716833    7157 json.go:208] got cmd:9

cmd 0 again:

06:58:38 I0401 06:58:38.716838    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:38.716848    7157 json.go:229] get command NEXT: send 92, receive 88
06:58:38 I0401 06:58:38.716852    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:38.716857    7157 json.go:229] get command NEXT: send 92, receive 92
06:58:38 I0401 06:58:38.716860    7157 json.go:208] got cmd:9
06:58:38 I0401 06:58:38.716869    7157 json.go:165] hyperstart API version:4243, VM hyperstart API version: 4243

cmd 20:

06:58:38 I0401 06:58:39.059520    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.059526    7157 json.go:229] get command NEXT: send 79, receive 8
06:58:38 I0401 06:58:39.059530    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.059550    7157 json.go:229] get command NEXT: send 79, receive 79
06:58:38 I0401 06:58:39.059555    7157 json.go:208] got cmd:9

cmd 21:

06:58:38 I0401 06:58:39.073094    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.073102    7157 json.go:229] get command NEXT: send 83, receive 8
06:58:38 I0401 06:58:39.073109    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.073126    7157 json.go:229] get command NEXT: send 83, receive 83
06:58:38 I0401 06:58:39.073132    7157 json.go:208] got cmd:9

cmd 17:

06:58:38 I0401 06:58:39.255774    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.255783    7157 json.go:229] get command NEXT: send 494, receive 8
06:58:38 I0401 06:58:39.255790    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.255798    7157 json.go:229] get command NEXT: send 494, receive 494
06:58:38 I0401 06:58:39.255804    7157 json.go:208] got cmd:9

cmd 24 (kill):

06:58:38 I0401 06:58:39.264425    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.264438    7157 json.go:229] get command NEXT: send 116, receive 8
06:58:38 I0401 06:58:39.264448    7157 json.go:208] got cmd:14
06:58:38 I0401 06:58:39.264460    7157 json.go:229] get command NEXT: send 116, receive 116
06:58:38 I0401 06:58:39.264470    7157 json.go:208] got cmd:9

but why?

06:58:42 E0401 06:58:43.008096    7157 json.go:522] get hyperstart API version error: send ctl channel error, the hyperstart might have closed
bergwolf commented 7 years ago

watchHyperstart() watches hyperstart state by keeping sending APIVersion command. So whenever vm gets killed, runv prints get hyperstart API version error: send ctl channel error, the hyperstart might have closed.

see https://github.com/hyperhq/runv/blob/master/hypervisor/hypervisor.go#L55

gnawux commented 7 years ago

however, for this case, this should not happened at this time, seems that no version request was issued at that time

bergwolf commented 7 years ago

watchHyperstart() sends APIVersion command every 10 seconds until it fails. So it could be watching previous containers, like I0401 06:58:32.923255 7157 json.go:208] got cmd:, which matches in time.

gnawux commented 7 years ago

every 10 seconds, do you mean this might be error message for other pods? I will made a PR for clarifying this.

gnawux commented 7 years ago

@bergwolf add some log info in #467, hope this could help.

gnawux commented 7 years ago

Although the log should be improved, this is not a bug I think. close this issue