uber / aresdb

A GPU-powered real-time analytics storage and query engine.
https://eng.uber.com/aresdb/
Apache License 2.0
2.99k stars 232 forks source link

set old pointers to null so that it won't free invalid pointer in GPU #352

Closed jshencode closed 4 years ago

jshencode commented 4 years ago

on panic recovery, we free all device pointers

panic: Panic happens when processing query
ERROR when calling CUDA functions: DeviceAllocate: invalid argument

goroutine 19523070 [running]:
github.com/uber/aresdb/utils.StackError(0x0, 0x0, 0xc070d7f4f0, 0x44, 0x0, 0x0, 0x0, 0x0)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/utils/error.go:61 +0x407
github.com/uber/aresdb/cgoutils.DoCGoCall(0xc03c727008, 0x7f7307000000)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/cgoutils/utils.go:31 +0xa6
github.com/uber/aresdb/cgoutils.doCGoCall(0xc03c727038, 0x48173b)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/cgoutils/memory.go:188 +0x49
github.com/uber/aresdb/cgoutils.DeviceAllocate(0x39d83e52, 0x0, 0x15cf438)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/cgoutils/memory.go:104 +0x5c
github.com/uber/aresdb/query.(*memoryTrackingDeviceAllocatorImpl).deviceAllocate(0xc013b72fa0, 0x39d83e52, 0x0, 0x39d83e52, 0x0, 0x7f7307000000, 0xc03c727001)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/device_allocator.go:199 +0x35
github.com/uber/aresdb/query.deviceAllocate(0x39d83e52, 0x0, 0x39d83e52, 0x0, 0x7f7307000000, 0x1)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/device_allocator.go:122 +0x49
github.com/uber/aresdb/query.(*oopkBatchContext).reallocateResultBuffers(0xc015c90cd0, 0xc015c90de0, 0x12, 0x7f7b6411c860, 0xc03c7271b0)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/aql_processor.go:787 +0x13f
github.com/uber/aresdb/query.(*oopkBatchContext).prepareForDimAndMeasureEval(0xc015c90cd0, 0x12, 0x4, 0x10501010000, 0x7f7b6411c860)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/aql_processor.go:745 +0x101
github.com/uber/aresdb/query.(*BatchExecutorImpl).project(0xc07289bcc0)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/aql_batchexecutor.go:203 +0xbd
github.com/uber/aresdb/query.(*AQLQueryContext).runBatchExecutor(0xc015c90c00, 0x1705740, 0xc07289bcc0, 0x1)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/aql_processor.go:1327 +0xac
github.com/uber/aresdb/query.(*AQLQueryContext).ProcessQuery(0xc015c90c00, 0x171abc0, 0xc000d94240)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/query/aql_processor.go:125 +0x45e
github.com/uber/aresdb/api.handleQuery(0x171abc0, 0xc000d94240, 0x16eb880, 0xc0008d5760, 0xc0003be910, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, ...)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/api/query_handler.go:292 +0x469
github.com/uber/aresdb/api.(*QueryHandler).handleAQLInternal(0xc00105c3c0, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/api/query_handler.go:227 +0x372
github.com/uber/aresdb/api.(*QueryHandler).HandleAQL.func1()
        /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191219175059-d95bb19ef6e9/api/query_handler.go:97 +0xd5
github.com/m3db/m3/src/x/sync.(*workerPool).GoIfAvailable.func1(0xc0498887b0, 0xc0010d4748)
        /home/jians/gocode/pkg/mod/github.com/m3db/m3@v0.10.2/src/x/sync/worker_pool.go:55 +0x27
created by github.com/m3db/m3/src/x/sync.(*workerPool).GoIfAvailable
        /home/jians/gocode/pkg/mod/github.com/m3db/m3@v0.10.2/src/x/sync/worker_pool.go:54 +0x6e [recovered]
        panic: ERROR when calling CUDA functions: DeviceFree: invalid device pointer
codecov[bot] commented 4 years ago

Codecov Report

Merging #352 into master will decrease coverage by <.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   71.18%   71.18%   -0.01%     
==========================================
  Files         177      177              
  Lines       23550    23551       +1     
==========================================
  Hits        16765    16765              
+ Misses       5445     5444       -1     
- Partials     1340     1342       +2
Impacted Files Coverage Δ
query/aql_processor.go 80.83% <100%> (+0.02%) :arrow_up:
broker/query_plan_non_agg.go 75.58% <0%> (-4.07%) :arrow_down:
datanode/datanode.go 18.48% <0%> (+1.51%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 88ced45...437d7a0. Read the comment docs.