tschellenbach / Stream-Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
https://getstream.io/
Other
4.73k stars 542 forks source link

Another sorting of feed not based on the time #54

Open xsanch opened 10 years ago

xsanch commented 10 years ago

Hello,

thanks for the great piece of code. I am using feedly to create a feed of content (links, photos, videos etc) all is good. Now I would like to make some sort of ranking of the feed and resort the feed based on number of shares, upvotes, downvotes etc. I would think that creating a new feed just for this top content would be ok.

What I would think of is to change following code to reflect the scores to be based on the ranking and not on the serialization id as it is right now.

./storage/redis/timeline_storage.py:

def add_to_storage(self, key, activities, batch_interface=None):
    cache = self.get_cache(key)
    # turn it into key value pairs
    scores = map(long, activities.keys())
    score_value_pairs = zip(scores, activities.values())
    result = cache.add_many(score_value_pairs)
tbarbugli commented 10 years ago

Hi, I think just changing the serialization_id wont work or at the very least it will break lots of things in feedly. The serialization_id (redis sorted set's score) are unique (feedly does its best enforcing that) and are used to lookup activities stored in feeds. I think a much better approach would be to store activities' scores on a index (eg. another redis sorted set) and use it to sort activities by their score without breaking the rest of behaviour.

2014-05-06 19:14 GMT+02:00 xsanch notifications@github.com:

Hello,

thanks for the great piece of code. I am using feedly to create a feed of content (links, photos, videos etc) all is good. Now I would like to make some sort of ranking of the feed and resort the feed based on number of shares, upvotes, downvotes etc. I would think that creating a new feed just for this top content would be ok.

What I would think of is to change following code to reflect the scores to be based on the ranking and not on the serialization id as it is right now.

./storage/redis/timeline_storage.py:

def add_to_storage(self, key, activities, batch_interface=None): cache = self.get_cache(key)

turn it into key value pairs

scores = map(long, activities.keys())
score_value_pairs = zip(scores, activities.values())
result = cache.add_many(score_value_pairs)

— Reply to this email directly or view it on GitHubhttps://github.com/tschellenbach/Feedly/issues/54 .

xsanch commented 10 years ago

Hi,

thanks for the reply, are you sure that feedly is using the score for something other then sorting the set, I didn't find any other operation besides remove_by_scores in storage/redis/structures/sorted_set.py .

I see these redis operations: zadd: zcount: zrevrank: zcscore: zremrangebyrank: zrange: zrevrange:

zremrangebyscore All of the above don't expect the score to be used, the only redis command expecting the actual score is zremrangebyscore which is used in remove_by_scores. I didn't find anywhere in feedly that this function is used - other then in testing...

Thanks,

Jorge

tschellenbach commented 10 years ago

Hi Jorge,

I don't think it's really easy to get this up and running. I would personally love to see an explanation and maybe some docs if you succeed.

Best, Thierry

xsanch commented 10 years ago

Hi,

this is what I got so far, I just overwrite the add_to_storage and added the **kwargs arguments to that method. For sorting I am passing dictionary where key is the serialization_id of the activity and value is the rank. This is passed to add method of feed.

class RankedRedisTimelineStorage(RedisTimelineStorage):
    def common_entries(self, *dcts):
        for i in set(dcts[0]).intersection(*dcts[1:]):
            yield tuple(d[i] for d in dcts)

    def add_to_storage(self, key, activities, batch_interface=None, **kwargs):
        cache = self.get_cache(key)
        # turn it into key value pairs
        score_value_pairs = list(self.common_entries(kwargs.get('rank'), activities))
        result = cache.add_many(score_value_pairs)
        for r in result:
            # errors in strings?
            # anyhow raise them here :)
            if hasattr(r, 'isdigit') and not r.isdigit():
                raise ValueError('got error %s in results %s' % (r, result))
        return result

class RankedRedisFeed(RedisFeed):
    timeline_storage_class = RankedRedisTimelineStorage

    key_format = 'feed:rankcontent:%(user_id)s'

I did some basic tests and looks good so far:

>>> feed = RankedRedisFeed(100000001)
>>> uf = feedly.get_user_feed(100000001)
>>> x = uf[:1][0]
>>> f= {}
>>> f[x.serialization_id] = long(5888.6215989000002)
>>> feed.add(x, rank=f)        
[1]

Added multiple activities, they are sorted correctly:
>>> feed[:19]
[Activity(messaged) 100000001 1000000791, Activity(messaged) 100000001 1000000792, Activity(messaged) 100000001 1000000807]

This is what redis shows:
redis 127.0.0.1:6379> ZREVRANGE feed:rankcontent:100000001 0 100 withscores
1) "13991601850001000000791005"
2) "5910"
3) "13991609950001000000792005"
4) "5900"
5) "14000006600001000000807005"
6) "5888"
tbarbugli commented 10 years ago

That sounds cool, perhaps you want to have a default score and make the rank parameter more explicit (right now is part of kwargs). That would make the integration of this feed in the test suite much easier; I really suggest you to do that before using it for something in production ;)

Cheers, Tommaso

2014-05-16 5:54 GMT+02:00 xsanch notifications@github.com:

Hi,

this is what I got so far, I just overwrite the add_to_storage and added the **kwargs arguments to that method. For sorting I am passing dictionary where key is the serialization_id of the activity and value is the rank. This is passed to add method of feed.

class RankedRedisTimelineStorage(RedisTimelineStorage): def common_entries(self, _dcts): for i in set(dcts[0]).intersection(_dcts[1:]): yield tuple(d[i] for d in dcts)

def add_to_storage(self, key, activities, batch_interface=None, **kwargs):
    cache = self.get_cache(key)
    # turn it into key value pairs
    score_value_pairs = list(self.common_entries(kwargs.get('rank'), activities))
    result = cache.add_many(score_value_pairs)
    for r in result:
        # errors in strings?
        # anyhow raise them here :)
        if hasattr(r, 'isdigit') and not r.isdigit():
            raise ValueError('got error %s in results %s' % (r, result))
    return result

class RankedRedisFeed(RedisFeed): timeline_storage_class = RankedRedisTimelineStorage

key_format = 'feed:rankcontent:%(user_id)s'

I did some basic tests and looks good so far:

feed = RankedRedisFeed(100000001) uf = feedly.get_user_feed(100000001) x = uf[:1][0] f= {} f[x.serialization_id] = long(5888.6215989000002) feed.add(x, rank=f) [1]

Added multiple activities, they are sorted correctly:

feed[:19] [Activity(messaged) 100000001 1000000791, Activity(messaged) 100000001 1000000792, Activity(messaged) 100000001 1000000807]

This is what redis shows: redis 127.0.0.1:6379> ZREVRANGE feed:rankcontent:100000001 0 100 withscores 1) "13991601850001000000791005" 2) "5910" 3) "13991609950001000000792005" 4) "5900" 5) "14000006600001000000807005" 6) "5888"

— Reply to this email directly or view it on GitHubhttps://github.com/tschellenbach/Feedly/issues/54#issuecomment-43293963 .

tbarbugli commented 10 years ago

Hi, It would awesome to know if this experiment worked out, depending on the outcome and the changes required we could write some docs or include this in the package!

Tommaso