kakao / s2graph

This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one
https://github.com/apache/incubator-s2graph
Other
250 stars 32 forks source link

Option to provide randomness to a query result #94

Closed HyunsungJo closed 8 years ago

HyunsungJo commented 8 years ago

Problem:

I've noticed that S2Graph clients quite often shuffle the query results before serving them to users in order to give some randomness to user experience.

An option to randomly sample a set of queried edges will result in a much simpler client code.

For example, lets say a client is running an AB test on S2Graph items with A) a sorted bucket and B) a random bucket.

As is, she will have to identify the random bucket id B and mix up the result.

With the suggested feature, both buckets can be handled uniformly.

Idea:

Right now, I'm considering a step-level integer-type parameter "sample" that will tell S2Graph to randomly sample N edges from the result set of the corresponding step.

Any sort of guidance is welcomed!

SteamShon commented 8 years ago

@mojo22jojo quick question, is there any reason that make "sample" parameter on step level(I mean not queryParam level)? what if step has multiple queryParams, and it`s not clear for me to understand which queryParam query should take samples from.

P.S. I copied and pasted what`s in PR #97 for better explanation.

By placing an integer-type parameter "sample" in a step query, you can randomly sample N edges from the corresponding step result.
Sampling from different steps in a multi-step query is also supported.
Please refer to the query example below:
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
    "srcVertices": [{"serviceName": "s2graph", "columnName": "a_id", "id":"1"}],
    "steps": [
      {"step": [{"label": "sampling_test_label", "direction": "out", "offset": 0, "limit": 10}], "sample": 2},  //  <== This will randomly draw two edges from step 1 and continue to step 2 with only those two cases.
      {"step": [{"label": "sampling_test_label", "direction": "out", "offset": 0, "limit": 10}]}
    ]
}'

curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
    "srcVertices": [{"serviceName": "s2graph", "columnName": "a_id", "id":"1"}],
    "steps": [
      {"step": [{"label": "sampling_test_label", "direction": "out", "offset": 0, "limit": 10}]},
      {"step": [{"label": "sampling_test_label", "direction": "out", "offset": 0, "limit": 10}], "sample": 2}  //  <== This will randomly draw two edges from each result sets of step 2 and return them as a final result.
    ]
}'
HyunsungJo commented 8 years ago

@SteamShon Made some changes; "sample" is now a query parameter.

HyunsungJo commented 8 years ago

resolved by #190