kakao / s2graph

This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one
https://github.com/apache/incubator-s2graph
Other
250 stars 32 forks source link

Add “divide” operation to “scorePropagateOp" #258

Open wishoping opened 8 years ago

wishoping commented 8 years ago

Ratio value in their service is common use cases of service analysis. Known methods to calculate ratio is that divide values between counting data or aggregating values. Already, S2Graph query supports counting or aggregating values within S2Graph storage. With S2Graph's function, you can calculate ratio just dividing values. That is an easy way to calculate the ratio. However, it can be a more simple way to calculate the ratio. It is that calculation occurred in S2Graph web application with just one RPC, one graph query call. This is a suggestion of the ratio calculation query. If we suppose to have two labels(impression feedbacks label and click feedbacks label), we can get a number of impressions and a number of clicks by a user. Using two value, we can calculate CTR(Click Through Rate) with below two count query.

Impression query

{
  "srcVertices": [{
    "serviceName": "some_service",
    "columnName": "user_id",
    "id": "user_a"
  }],
  "steps": [{
    "step": [{
      "label": "impression_feedback_label",
      "direction": "out",
      "offset": 0,
      "limit": 100
    }]
  }]
}

Click query

{
  "srcVertices": [{
    "serviceName": "some_service",
    "columnName": "user_id",
    "id": "user_a"
  }],
  "steps": [{
    "step": [{
      "label": "click_feedback_label",
      "direction": "out",
      "offset": 0,
      "limit": 100
    }]
  }]
}

After fetching each result with upper queries, we can get a CTR.

However, we can make a one query with divide operation to scorePropagageOp.

{
  "limit" : 10,
  "groupBy" : [ "from" ],
  "duplicate" : "sum",
  "srcVertices" : [ {
    "serviceName" : "some_service",
    "columnName" : "user_id",
    "id" : "user_a"
  } ],
  "steps" : [ {
    "step" : [ {
      "label" : "impression_feedback_label",
      "direction" : "out",
      "offset" : 0,
      "limit" : 10,
      "groupBy" : [ "from" ],
      "duplicate" : "countSum",
      "transform" : [ [ "_from" ] ]
    } ]
  }, {
    "step" : [ {
      "label": "click_feedback_label",
      "direction" : "out",
      "offset" : 0,
      "limit" : 10,
      "scorePropagateOp" : "divide",
      "scorePropagateShrinkage" : 500
    } ]
  } ]
}

There is another query param option key, scorePropagateShrinkage. It is used to try normalizing results. We use just ratio value to sort the results. However, ratio value can be non-deterministic. Ratio 1.0 by 1/1 is larger than 0.9 by 9/10. For this reason, we can add scorePropagateShrinkage score value which is sufficiently big to the denominator. Now we can re-calculate by 1 / (1 + 500) =0.00199600798403 and 9 / (1 + 500) = 0.01796407185629, then the latter is larger value.

emesday commented 8 years ago

A nice feature. We will query like friends who show the high conversion rate. using this proposed scorePropagateOp and scorePropagateShrinkage.