superphy / prairiedog

next-gen pangenome graphs for predictive genomics
Other
0 stars 0 forks source link

bufio crashed #113

Closed kevinkle closed 4 years ago

kevinkle commented 5 years ago
MAP 05h21m28s nquad_count:3.159G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.319G edge_speed:327.6k/sec
MAP 05h21m29s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.319G edge_speed:327.6k/sec
MAP 05h21m30s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.319G edge_speed:327.6k/sec
MAP 05h21m31s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.320G edge_speed:327.6k/sec
MAP 05h21m32s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.320G edge_speed:327.6k/sec
MAP 05h21m33s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.321G edge_speed:327.6k/sec
MAP 05h21m34s nquad_count:3.160G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.321G edge_speed:327.6k/sec
Processing file (663 out of 2260): /home/kevin/prairiedog/outputs/samples/SRR2567185.fasta.rdf
MAP 05h21m35s nquad_count:3.161G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.321G edge_speed:327.6k/sec
MAP 05h21m36s nquad_count:3.161G err_count:0.000 nquad_speed:163.8k/sec edge_count:6.322G edge_speed:327.6k/sec
        /usr/local/go/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc03eadf2c0, 0xc040bd6000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x119
os.(*File).read(0xc00031a170, 0xc040bd6000, 0x4000, 0x4000, 0x7f7e2af64e40, 0x1, 0x100000000000000)
        /usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc00031a170, 0xc040bd6000, 0x4000, 0x4000, 0xc379ed8750, 0xc203031e88, 0x8c89a7)
        /usr/local/go/src/os/file.go:108 +0x69
bufio.(*Reader).fill(0xc03eadf320)
        /usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc03eadf320, 0xa, 0x400, 0xc347303400, 0x400, 0x400, 0x146d788)
        /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc379ed8750, 0x21, 0xc11a47fe60)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:86 +0x36e
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 832692 [syscall]:
syscall.Syscall(0x0, 0x2fdd, 0xc1f476a000, 0x4000, 0xc0b6e14d40, 0xc0f7b6e4e0, 0x8de2c0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x2fdd, 0xc1f476a000, 0x4000, 0x4000, 0xc0b6e14d01, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:732 +0x5a
syscall.Read(0x2fdd, 0xc1f476a000, 0x4000, 0x4000, 0x1, 0xc0f7b6e5a0, 0xc03eadf3e0)
        /usr/local/go/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc03eadf380, 0xc1f476a000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x119
os.(*File).read(0xc00031a178, 0xc1f476a000, 0x4000, 0x4000, 0x7f7cbdd5cc78, 0x1, 0x100000000000000)
        /usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc00031a178, 0xc1f476a000, 0x4000, 0x4000, 0xc379ed87b0, 0xc0f7b6e688, 0x8c89a7)
        /usr/local/go/src/os/file.go:108 +0x69
bufio.(*Reader).fill(0xc03eadf3e0)
        /usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc03eadf3e0, 0xa, 0x400, 0xc347303800, 0x400, 0x400, 0xeec)
        /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc379ed87b0, 0x21, 0xc11a47fec0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:86 +0x36e
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 832693 [runnable]:
syscall.Syscall(0x0, 0x201c, 0xc1fb3d4000, 0x4000, 0x4000, 0x4000, 0x0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x201c, 0xc1fb3d4000, 0x4000, 0x4000, 0xc2e9da9301, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:732 +0x5a
syscall.Read(0x201c, 0xc1fb3d4000, 0x4000, 0x4000, 0x1, 0xc0f7b6eda0, 0xc21b074000)
        /usr/local/go/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc11a47ff80, 0xc1fb3d4000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x119
os.(*File).read(0xc047d362d8, 0xc1fb3d4000, 0x4000, 0x4000, 0x7f7e8c8f8e58, 0x1, 0x100000000000000)
        /usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc047d362d8, 0xc1fb3d4000, 0x4000, 0x4000, 0xc379ed8810, 0xc0f7b6ee88, 0x8c89a7)
        /usr/local/go/src/os/file.go:108 +0x69
bufio.(*Reader).fill(0xc21b074000)
        /usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc21b074000, 0xa, 0x400, 0xc337fb9800, 0x400, 0x400, 0x0)
        /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc379ed8810, 0x21, 0xc11a47ff20)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:86 +0x36e
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171
kevinkle commented 5 years ago

Maybe related to our bump in ulimit to 100000? Going to try with setting the number of go routines to 18 -j 18.

kevinkle commented 4 years ago

same issue with -j 18

MAP 04h33m16s nquad_count:2.747G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.495G edge_speed:335.1k/sec
MAP 04h33m17s nquad_count:2.747G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.495G edge_speed:335.1k/sec
MAP 04h33m18s nquad_count:2.748G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.495G edge_speed:335.1k/sec
MAP 04h33m19s nquad_count:2.748G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.496G edge_speed:335.1k/sec
MAP 04h33m20s nquad_count:2.748G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.496G edge_speed:335.1k/sec
Processing file (574 out of 2260): /home/kevin/prairiedog/outputs/samples/SRR2567096.fasta.rdf
MAP 04h33m21s nquad_count:2.748G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.496G edge_speed:335.1k/sec
MAP 04h33m22s nquad_count:2.748G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.497G edge_speed:335.1k/sec
MAP 04h33m23s nquad_count:2.749G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.497G edge_speed:335.1k/sec
MAP 04h33m24s nquad_count:2.749G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.497G edge_speed:335.1k/sec
MAP 04h33m25s nquad_count:2.749G err_count:0.000 nquad_speed:167.6k/sec edge_count:5.498G edge_speed:335.1k/sec
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799608 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc051884090, 0x21, 0xc00fec55c0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799609 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0518840f0, 0x21, 0xc00fec5620)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799610 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc051884150, 0x21, 0xc00fec5680)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799611 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0518841b0, 0x21, 0xc00fec56e0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799612 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc051884210, 0x21, 0xc00fec5740)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799613 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc051884270, 0x21, 0xc00fec57a0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 799614 [syscall]:
syscall.Syscall(0x0, 0x2010, 0xc24dd52000, 0x4000, 0xc06aa36240, 0x0, 0x8de2c0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x2010, 0xc24dd52000, 0x4000, 0x4000, 0xc06aa36201, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:732 +0x5a
syscall.Read(0x2010, 0xc24dd52000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc00fec5aa0, 0xc24dd52000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x119
os.(*File).read(0xc00000e180, 0xc24dd52000, 0x4000, 0x4000, 0x7f23970c1260, 0x1, 0x10000c07554b658)
        /usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc00000e180, 0xc24dd52000, 0x4000, 0x4000, 0xc07554b688, 0xc07554b688, 0x8c89a7)
        /usr/local/go/src/os/file.go:108 +0x69
bufio.(*Reader).fill(0xc00fec5b00)
        /usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc00fec5b00, 0xa, 0x400, 0xc28c31f800, 0x400, 0x400, 0x0)
        /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0518842d0, 0x21, 0xc00fec5800)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:86 +0x36e
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171
kevinkle commented 4 years ago

possible and out of memory issue https://github.com/golang/go/issues/1650

kevinkle commented 4 years ago

opened a post https://discuss.dgraph.io/t/dgraph-bulk-crashes-on-bufio-reader/

kevinkle commented 4 years ago
goroutine 786885 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e62d0, 0x21, 0xc051e0cfc0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786886 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e6330, 0x21, 0xc051e0d020)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786887 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e6390, 0x21, 0xc051e0d080)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786888 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e63f0, 0x21, 0xc051e0d0e0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786889 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e6450, 0x21, 0xc051e0d140)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786890 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e64b0, 0x21, 0xc051e0d1a0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:78
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171

goroutine 786891 [syscall]:
syscall.Syscall(0x0, 0x251e, 0xc156928000, 0x4000, 0xc0352d8a40, 0xc0275f1ce0, 0x8de2c0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x251e, 0xc156928000, 0x4000, 0x4000, 0xc0352d8a01, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:732 +0x5a
syscall.Read(0x251e, 0xc156928000, 0x4000, 0x4000, 0x1, 0xc0275f1da0, 0xc051e0d2c0)
        /usr/local/go/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc051e0d260, 0xc156928000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x119
os.(*File).read(0xc10914c1c8, 0xc156928000, 0x4000, 0x4000, 0x7f8d1c1b3ac0, 0x1, 0x100000000000000)
        /usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc10914c1c8, 0xc156928000, 0x4000, 0x4000, 0xc0e77e6510, 0xc0275f1e88, 0x8c89a7)
        /usr/local/go/src/os/file.go:108 +0x69
bufio.(*Reader).fill(0xc051e0d2c0)
        /usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc051e0d2c0, 0xa, 0x400, 0xc02879fc00, 0x400, 0x400, 0x3)
        /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xc0e77e6510, 0x21, 0xc051e0d200)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:86 +0x36e
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:53 +0x171
kevinkle commented 4 years ago

badger 2019/07/16 11:00:48 INFO: Storing value log head: {Fid:3 Len:47 Offset:35967093}
badger 2019/07/17 04:40:50 INFO: Storing value log head: {Fid:4 Len:47 Offset:39920719}
badger 2019/07/17 04:40:53 INFO: Force compaction on level 0 done
badger 2019/07/17 04:40:54 INFO: All 0 tables opened in 0s
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

runtime stack:
runtime.throw(0x147bfcd, 0x11)
kevin@panther:~$ head -n 100 screenlog.0

badger 2019/07/16 11:00:48 INFO: Storing value log head: {Fid:3 Len:47 Offset:35967093}
badger 2019/07/17 04:40:50 INFO: Storing value log head: {Fid:4 Len:47 Offset:39920719}
badger 2019/07/17 04:40:53 INFO: Force compaction on level 0 done
badger 2019/07/17 04:40:54 INFO: All 0 tables opened in 0s
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

runtime stack:
runtime.throw(0x147bfcd, 0x11)
        /usr/local/go/src/runtime/panic.go:608 +0x72
runtime.checkmcount()
        /usr/local/go/src/runtime/proc.go:598 +0xac
runtime.mcommoninit(0xc08d026e00)
        /usr/local/go/src/runtime/proc.go:616 +0x76
runtime.allocm(0xc000506f00, 0x0, 0xc0481f6e00)
        /usr/local/go/src/runtime/proc.go:1573 +0x14d
runtime.newm(0x0, 0xc000506f00)
        /usr/local/go/src/runtime/proc.go:1891 +0x39
runtime.startm(0xc000506f00, 0x1fa3500)
        /usr/local/go/src/runtime/proc.go:2066 +0x139
runtime.handoffp(0xc000506f00)
        /usr/local/go/src/runtime/proc.go:2093 +0x55
runtime.retake(0x10761283b13291, 0xd00000018)
        /usr/local/go/src/runtime/proc.go:4488 +0x139
runtime.sysmon()
        /usr/local/go/src/runtime/proc.go:4406 +0x22e
runtime.mstart1()
        /usr/local/go/src/runtime/proc.go:1282 +0xe6
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1248 +0x6e

goroutine 1 [chan receive]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xc03cea00c0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:37 +0x1aa
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc00068f830)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:282 +0x117
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:192 +0xbab
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc0001ac480, 0xc0004468f0, 0x0, 0xb)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:44 +0x52
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc0001ac480, 0xc000446840, 0xb, 0xb, 0xc0001ac480, 0xc000446840)
        /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:702 +0x2d3
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1efa740, 0xc, 0x0, 0x0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:783 +0x2dc
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).Execute(0x1efa740, 0x8b60ad, 0x1f02928)
        /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:736 +0x2b
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:60 +0x3b
main.main()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:33 +0x88

goroutine 5 [chan receive]:
github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x1fa3400)
        /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:882 +0x8b
created by github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.init.0
        /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:410 +0x203

goroutine 19 [syscall, 1071 minutes]:
os/signal.signal_recv(0x14c5980)
        /usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 20 [chan receive]:
github.com/dgraph-io/dgraph/x.init.1.func1()
        /ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:81 +0x7e
created by github.com/dgraph-io/dgraph/x.init.1
        /ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:76 +0x3ce

goroutine 9 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc0000aacc0)
        /go/src/go.opencensus.io/stats/view/worker.go:147 +0xdd
created by go.opencensus.io/stats/view.init.0
        /go/src/go.opencensus.io/stats/view/worker.go:29 +0x57

goroutine 10 [chan receive]:
github.com/dgraph-io/dgraph/worker.(*rateLimiter).bleed(0x1fc0f6c)
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/proposal.go:61 +0x95
created by github.com/dgraph-io/dgraph/worker.init.0
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/proposal.go:46 +0x41

goroutine 107 [IO wait, 1071 minutes]:
internal/poll.runtime_pollWait(0x7f8e3908fe30, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000442418, 0x72, 0xc0000aa300, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000442418, 0xffffffffffffff00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000442400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000442400, 0xc000187500, 0x30, 0x30)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00043a0b0, 0x50, 0x7f8e3d300440, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc00043a0b0, 0x891008, 0x30, 0x13de580)
        /usr/local/go/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc00043a0b0, 0x13de580, 0xc0001662a0, 0x132bd40, 0x1ee30d0)
kevinkle commented 4 years ago

Added debug.SetMaxThreads(30000) and built dgraph from src on panther. Trying it now

kevinkle commented 4 years ago

The newly built dgraph binary is using bit more ram than before, ~40GB when was ~30GB before. Also slower at 202.1k/s versus ~300k/s edge speed

kevinkle commented 4 years ago

Looking good so far:

[09:05:27-0600] REDUCE 21h06m11s 18.46% edge_count:2.033G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:28-0600] REDUCE 21h06m12s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:29-0600] REDUCE 21h06m13s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:30-0600] REDUCE 21h06m14s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:31-0600] REDUCE 21h06m15s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:32-0600] REDUCE 21h06m16s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:325.9k plist_speed:19.82/sec
[09:05:33-0600] REDUCE 21h06m17s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:326.0k plist_speed:19.82/sec
[09:05:34-0600] REDUCE 21h06m18s 18.46% edge_count:2.034G edge_speed:123.7k/sec plist_count:326.0k plist_speed:19.82/sec
kevinkle commented 4 years ago

Setting the threads fixed our issue:

[04:54:50-0600] REDUCE 40h55m33s 99.98% edge_count:11.02G edge_speed:125.5k/sec plist_count:3.950M plist_speed:44.99/sec
[04:54:51-0600] REDUCE 40h55m34s 99.98% edge_count:11.02G edge_speed:125.5k/sec plist_count:3.964M plist_speed:45.15/sec
[04:54:52-0600] REDUCE 40h55m35s 99.98% edge_count:11.02G edge_speed:125.5k/sec plist_count:3.978M plist_speed:45.30/sec
[04:54:53-0600] REDUCE 40h55m36s 99.98% edge_count:11.02G edge_speed:125.5k/sec plist_count:3.994M plist_speed:45.48/sec
[04:54:54-0600] REDUCE 40h55m37s 99.99% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.010M plist_speed:45.67/sec
[04:54:55-0600] REDUCE 40h55m38s 99.99% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.027M plist_speed:45.86/sec
[04:54:56-0600] REDUCE 40h55m39s 99.99% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.037M plist_speed:45.97/sec
[04:54:57-0600] REDUCE 40h55m40s 99.99% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.052M plist_speed:46.14/sec
[04:54:58-0600] REDUCE 40h55m41s 99.99% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.067M plist_speed:46.31/sec
[04:54:59-0600] REDUCE 40h55m42s 100.00% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.083M plist_speed:46.49/sec
[04:55:00-0600] REDUCE 40h55m43s 100.00% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.100M plist_speed:46.69/sec
[04:55:01-0600] REDUCE 40h55m44s 100.00% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.116M plist_speed:46.87/sec
[04:55:02-0600] REDUCE 40h55m45s 100.00% edge_count:11.02G edge_speed:125.5k/sec plist_count:4.116M plist_speed:46.87/sec
Total: 40h55m45s