kakao / s2graph

This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one
https://github.com/apache/incubator-s2graph
Other
250 stars 32 forks source link

Refactor filterEdges #130

Closed SteamShon closed 9 years ago

SteamShon commented 9 years ago

after profile through visualvm, filterEdges on Graph contains unnecessary check which use many CPU cycle.

point to improve on Graph.filterEdges is following(develop branch).

  1. from design of rowKey, qualifier, degree edge can only exist at the very beginning of cells in HBase. checking if edge is degree edge should be done on one edge, not all fetched edges. (https://github.com/kakao/s2graph/blob/develop/s2core/src/main/scala/com/kakao/s2graph/core/Graph.scala#L506)
  2. duplicate policy check is unnecessary for label with strong consistencyLevel. (https://github.com/kakao/s2graph/blob/develop/s2core/src/main/scala/com/kakao/s2graph/core/Graph.scala#L524)
  3. expensive hashCode for BigDecimal. since only vertexId is considered on this scope, only possible datatype is string or long so instead of using BigDecimal.hashCode, switch to BigDecimal.longValue.hashCode would increase performance. (https://github.com/kakao/s2graph/blob/develop/s2core/src/main/scala/com/kakao/s2graph/core/Graph.scala#L501)

Personally, I am not a fan of micro-optimization, but filterEdges goes through every edges that fetched so maybe little bit optimization on this method would be necessary.

Through benchmark, I see lots of CPU cycle is waisted on Graph.toHashKey and just checking for exclude/include.

SteamShon commented 9 years ago

17% increase.

after applying patch. screen shot 2015-10-15 at 2 44 18 pm

comparing to old. screen shot 2015-10-15 at 3 03 19 pm