uber / aresdb

A GPU-powered real-time analytics storage and query engine.
https://eng.uber.com/aresdb/
Apache License 2.0
2.99k stars 232 forks source link

Refactor get primary key values to reduce ingestion latency #350

Closed jshencode closed 4 years ago

jshencode commented 4 years ago

Profiling shows that insert primary key has majority time taken in two allocations

  1. primaryKeyValues := make([]DataValue, len(primaryKeyCols)) LINE 323
  2. key := make([]byte, 0, keyLength) LINE 98

fixes

ROUTINE ======================== github.com/uber/aresdb/memstore/common.(*UpsertBatch).GetPrimaryKeyBytes in /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191212190927-a425d38c8c4d/memstore/common/upsert_batch.go
     1.12s     11.74s (flat, cum) 17.59% of Total
         .          .    315:   return primaryKeyCols, nil
         .          .    316:}
         .          .    317:
         .          .    318:// GetPrimaryKeyBytes returns primary key bytes for a given row. Note primaryKeyCol is not list of primary key
         .          .    319:// columnIDs.
      80ms       80ms    320:func (u *UpsertBatch) GetPrimaryKeyBytes(row int, primaryKeyCols []int, keyLength int) ([]byte, error) {
         .          .    321:   var key []byte
         .          .    322:   var err error
     160ms      5.21s    323:   primaryKeyValues := make([]DataValue, len(primaryKeyCols))
      90ms       90ms    324:   for i, col := range primaryKeyCols {
     520ms      2.51s    325:       primaryKeyValues[i], err = u.GetDataValue(row, col)
         .          .    326:       if err != nil {
         .          .    327:           return key, utils.StackError(err, "Failed to read primary key at row %d, col %d",
         .          .    328:               row, col)
         .          .    329:       }
         .          .    330:   }
         .          .    331:
     270ms      3.85s    332:   return GetPrimaryKeyBytes(primaryKeyValues, keyLength)
         .          .    333:}
         .          .    334:
         .          .    335:// ExtractBackfillBatch extracts given rows and stores in a new UpsertBatch
         .          .    336:// The returned new UpsertBatch is not fully serialized and can only be used for
         .          .    337:// structured reads.
ROUTINE ======================== github.com/uber/aresdb/memstore/common.GetPrimaryKeyBytes in /home/jians/gocode/pkg/mod/github.com/uber/aresdb@v0.0.3-0.20191212190927-a425d38c8c4d/memstore/common/primary_key.go
     1.93s      3.59s (flat, cum)  5.38% of Total
         .          .     92:       "eventTimeCutoff": pk.GetEventTimeCutoff(),
         .          .     93:   })
         .          .     94:}
         .          .     95:
         .          .     96:// GetPrimaryKeyBytes returns primary key bytes for a given row.
      40ms       40ms     97:func GetPrimaryKeyBytes(primaryKeyValues []DataValue, keyLength int) ([]byte, error) {
      50ms      1.48s     98:   key := make([]byte, 0, keyLength)
     250ms      250ms     99:   for _, value := range primaryKeyValues {
     140ms      140ms    100:       if !value.Valid {
         .          .    101:           return key, utils.StackError(nil, "Primary key cannot be null")
         .          .    102:       }
         .          .    103:
      50ms       50ms    104:       if value.IsBool {
         .          .    105:           if value.BoolVal {
         .          .    106:               key = append(key, byte(1))
         .          .    107:           } else {
         .          .    108:               key = append(key, byte(0))
         .          .    109:           }
         .          .    110:       } else {
     600ms      720ms    111:           for i := 0; i < DataTypeBits(value.DataType)/8; i++ {
     740ms      850ms    112:               key = append(key, *(*byte)(utils.MemAccess(value.OtherVal, i)))
         .          .    113:           }
         .          .    114:       }
         .          .    115:   }
      60ms       60ms    116:   return key, nil
         .          .    117:}
codecov[bot] commented 4 years ago

Codecov Report

Merging #350 into master will decrease coverage by <.01%. The diff coverage is 59.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #350      +/-   ##
==========================================
- Coverage   71.18%   71.18%   -0.01%     
==========================================
  Files         177      177              
  Lines       23533    23553      +20     
==========================================
+ Hits        16752    16766      +14     
- Misses       5438     5445       +7     
+ Partials     1343     1342       -1
Impacted Files Coverage Δ
subscriber/common/rules/job_config.go 68.42% <ø> (ø) :arrow_up:
utils/http.go 19.07% <ø> (ø) :arrow_up:
memstore/ingestion.go 81.22% <100%> (+0.07%) :arrow_up:
memstore/recovery.go 74.82% <100%> (-0.18%) :arrow_down:
memstore/common/data_value.go 79.09% <44.82%> (-1.69%) :arrow_down:
memstore/backfill.go 74.52% <50%> (+0.27%) :arrow_up:
memstore/archive_store.go 79.77% <66.66%> (+0.11%) :arrow_up:
subscriber/common/sink/sink.go 60.46% <66.66%> (+0.94%) :arrow_up:
memstore/common/upsert_batch.go 58.17% <70%> (+3.26%) :arrow_up:
memstore/common/primary_key.go 44% <80%> (+0.52%) :arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a425d38...703d10a. Read the comment docs.