openai / universe

Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
https://universe.openai.com
MIT License
7.45k stars 956 forks source link

Crash in Go (presumably go-vncdriver) runtime, allocating more stack #111

Closed tlbtlbtlb closed 6 years ago

tlbtlbtlb commented 7 years ago

Actual behavior

Starter universe-starter-agent with:

$ python train.py --num-workers 8--env-id flashgames.NeonRace-v0 --log-dir /mnt/kube-efs/universe-perfmon/usa-flashgames.NeonRace-v0-20170113-041927
  -m child

After about 2 hours playing NeonRace-v0, one universe-starter-agent worker crashes with:

runtime: newstack sp=0xc820da3380 stack=[0xc820d9c000, 0xc820da3fa0]
        morebuf={pc:0x7fadfdb93f6f sp:0xc820da3388 lr:0x0}
        sched={pc:0x7fadfdb7a617 sp:0xc820da3380 lr:0x0 ctxt:0x0}
runtime: failed to unwind through stackBarrier at SP 0xc820da3388; [ @@@ ==>]
fatal error: inconsistent state in stackBarrier

runtime stack:
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:547 +0x92
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.newstack()
        /usr/lib/go-1.6/src/runtime/stack.go:833 +0x56d
runtime.morestack()
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:359 +0x74

goroutine 5 [syscall, locked to thread]:
runtime: failed to unwind through stackBarrier at SP 0xc820da3388; [ @@@ ==>]
fatal error: inconsistent state in stackBarrier
panic during panic

runtime stack:
runtime.startpanic_m()
        /usr/lib/go-1.6/src/runtime/panic.go:604 +0x13e
runtime.systemstack(0x7fadfe02e4f8)
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:307 +0xa1
runtime.startpanic()
        /usr/lib/go-1.6/src/runtime/panic.go:525 +0x14
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:546 +0x85
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.tracebackothers(0xc820001200)
        /usr/lib/go-1.6/src/runtime/traceback.go:698 +0xb0
runtime.dopanic_m(0xc820001200, 0x7fadfdb65de2, 0x7fadf7ffe7c8)
        /usr/lib/go-1.6/src/runtime/panic.go:644 +0x1f5
runtime.dopanic.func1()
        /usr/lib/go-1.6/src/runtime/panic.go:534 +0x34
runtime.systemstack(0x7fadf7ffe7a0)
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:307 +0xa1
runtime.dopanic(0x0)
        /usr/lib/go-1.6/src/runtime/panic.go:535 +0x63
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:547 +0x92
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.newstack()
        /usr/lib/go-1.6/src/runtime/stack.go:833 +0x56d
runtime.morestack()
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:359 +0x74

Versions

Linux 0c0c02f2bfdb 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Python 3.5.2
Name: universe
Version: 0.21.1
Summary: Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
Home-page: https://github.com/openai/universe
Author: OpenAI
Author-email: universe@openai.com
License: UNKNOWN
Location: /experiment/universe
Requires: autobahn, docker-py, docker-pycreds, fastzbarlight, go-vncdriver, gym, Pillow, PyYAML, six, twisted, ujson
---
Name: gym
Version: 0.7.0
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: gym@openai.com
License: UNKNOWN
Location: /experiment/gym
Requires: numpy, requests, six, pyglet
---
Name: tensorflow
Version: 0.12.1
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/local/lib/python3.5/dist-packages
Requires: protobuf, six, wheel, numpy
---
Name: numpy
Version: 1.11.0
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion@scipy.org
License: BSD
Location: /usr/lib/python3/dist-packages
Requires:
---
Name: go-vncdriver
Version: 0.4.19
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/local/lib/python3.5/dist-packages
Requires: numpy
---
Name: Pillow
Version: 4.0.0
Summary: Python Imaging Library (Fork)
Home-page: http://python-pillow.org
Author: Alex Clark (Fork Author)
Author-email: aclark@aclark.net
License: Standard PIL License
Location: /usr/local/lib/python3.5/dist-packages
Requires: olefile
tlbtlbtlb commented 7 years ago

Another occurrence, after 4 hours of worker time. Details submitted as https://github.com/golang/go/issues/18718.

They suggest upgrading Go. I'm trying some long runs with go 1.7.4

tlbtlbtlb commented 7 years ago

After upgrading to go 1.7.4, I haven't seen this in about 200 agent-hours of operation. Leaving open, until we make 1.7.4 the default (which isn't trivial: ubuntu doesn't seem to have a package for it).