theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
110 stars 8 forks source link

Activity stream concept to replace feeds on dashboard #1679

Closed brendanheywood closed 9 years ago

brendanheywood commented 9 years ago

Re-imagining the feeds completely, to show all types of activity, ticking, editings, fav's etc all in one place. Sussinctly, customizably, and pretty darn neat (we think!)

image


A very high level idea to rework the whole feeds concept in the dashboard. The general principles are:

So some examples are:

The way I would see this working internally is that each type of action would have a date field and a natural grouping key, ie ticks would be grouped by person, date, and TLC. Each time a new tick is added then an action summary item is made, any more new ticks with the same. Potentially as more items are added the summary text it updated, and potentially it's sort date updated. So like facebook old items can get moved back up the activity feed, like a discussion that gets more comments, or a photo that has comments on it, or if Chris later that day added another 10 extra routes.

See also #1676 for including 'my' ticks

brendanheywood commented 9 years ago

a few thoughts on the technical side. I think the important thing for speed is that all items are in one table and filtered down from there rather than a bunch of extra joins. I'd see something similar to what we did for the update atom feed where we have 1 or 2 objects / actions / verbs setup but just extending it

Then in their settings page they would have a matrix of tick box grid for each type action down the left, and across the top 'Me | Friends' + 'Fav crags | My country | World' (I'll do a mockup if / when we need to) to specify what's in or out.

I've just done a trawl though stats to see what feeds people are actually watching:

624 users have set their feed:

Ticks:

Updates: (114)

The question is whether the extra summary data is generated at display time or ahead of time. Mostly a size / performance trade off so would need testing

brendanheywood commented 9 years ago

Just done a bit of apache log trawling and found something which utterly convinces me this is worth doing - already just TODAY there have been 91 api/climber/updates so far, the vast majority of them from the dashboard. The vast majority of those updates are in quick succession - in other words people are opening up their dashboard, looking at their feed, then toggling to another feed, then toggling back. There are other types of update to the same api endpoint but mostly from other pages so it's hard to split them out.

scd commented 9 years ago

I agree this is worthwhile doing.

I don't think we need a new table, the ActivityLog has Account, Node, Item, Star Schema Nx and a serialised Data field.

At the moment Node is mandatory, so we cannot use it for the likes of 'Simon started following Campbell'. This constraint can be relaxed.

The table contains every index related activity, which includes favoriting and ascents. This is good.

I think we need to work on two aspects of the business logic.

  1. Default rules for choosing what goes into the feed.
  2. What configuration can the user control for the default rules above.

At the moment the sql query is ordered list by time. However the new feed concept will aggregate activity for a particular person which may want to things out of order. In other words once we have decided to aggregate, say Brendan's ascents, then we need to make sure we go far enough back to get the relevant ascents.

It is likely that I would be interested in Brendan's activity from a week ago, because I am following him, over somebody else's activity from 5 minutes ago who I am not following.

If I am not following anybody, who do we show activity for?

If people who I am following have not been active for 3 months who do we show? (eg friends of friends, same country?)

If we occasionally throw in friends of friends then this may encourage more account following and better community networking.

Thinking out aloud, once we know who to show in the feed then the rest is pretty much already there and straight forward. IMO choosing who to show is the big innovative step here.

This feed needs to also be sympathetic to notable ascents - #246, which I am still keen on.

brendanheywood commented 9 years ago

If we don't use a new table we'll be doing a lot of group processing either at query time or display time and throwing away a lot of intermediate data. ie I could do 100+ edits in a crag to 50 unique routes, but I'd want to that to show as just one record: 'Brendan updated 50 routes in Arapiles'. It would be possible without the new table but I don't think anywhere near as simple nor as fast as a new higher level table, with most of the processing done at activity time instead of query/display time.

scd commented 9 years ago

see also #1232

scd commented 9 years ago

I am now looking at implementing a new table. I want to get agreement on some key dimensions.

If somebody logs three ascents on saturday and four on sunday in arapiles then will this be two stream items? I am assuming yes.

If somebody logs 2 ascents at arapiles in the morning then 4 ascents in grampians in the afternoon then will this be two stream items? I am assuming yes.

If somebody does the ascent on saturday but logs it on monday then will this create a stream from the monday date? I am assuming yes, otherwise there are a lot of complexities.

If somebody creates a route and logs an ascent of that route on the same day then will this will create two streams?

I think the 'what' needs a lot more discussion of exactly what goes into it (for example maybe rather than 'upload photo' it is 'contributed resource' which includes photos, topos and embeds. Whatever the outcome of the specifics of this discussion I think we need a list of possible 'what' items and an association of activity items to 'what'.

Preposed fields for the 'Stream' table

And for the 'Stream Collection'

This means we will probably have to add some more Ativity Items (eg Add a discussion, following, etc).

brendanheywood commented 9 years ago

Yes to all of above, except:

If somebody does the ascent on saturday but logs it on monday then will this create a stream from the monday date? I am assuming yes, otherwise there are a lot of complexities.

I see this as creating the event retrospectively on saturday, so the publish date with be monday, but the update date will be monday and so appear in the feed now rather than down the list. I don't see why this would be an issue. If I later on tuesday go and add yet another tick for saturday it would just append it to the same event and not create another event, but would update the updated date so it floats back to the top of the feed. I think to make this clearer rename the fields to 'publish date' -> 'event date' and 'update date' -> 'publish time or 'stream time'. The former is just a date while the latter is a full time stamp.

What

Maybe rename to 'verbPhrase'?

What is the stream collection for?

The one complexity I've been grappling with is how this will work when looking at a route or sub area. Lets say I tick 20 routes on saturday, 10 in area A, 10 in area B. These have the same TLC so get appended into the same event. Now if another person looks at area A or B then it should show this item, but if they look at area C then it should not. So how do we make this work? Doing this correctly would be a join back to tick or update table which I wanted to avoid. Need to hash this out some more

The verb phrases and the aggregation to form a summary also needs a lot more hashing out before we do anything. I'm gonna be a bit tight for time now, my family is visiting now for the next week or so, so I'd prefer to take our time and hash it out and get it right before we do too much code

scd commented 9 years ago

There are a couple of problems I see with using activity date versus publish date (and it should be noted that it is only an issue for logging ascents).

  1. Do we accept partial dates, what does that mean in an activity stream - actually non issue if we use mysql dates
  2. What happens if somebody logs ascents for 2010 (ie way back in the stream, but I think I would still like to know about it - actually a non-issue if we sort by update date).
  3. How do we handle somebody updating their ascent date. May require streams being created/split.

I don't see a problem creating a stream on a publish date, but report on activity date. So if somebody comes back from a weekend in in Arapiles and logs ascents on monday, then all the ascents will be in the same stream, but we could identify the separate dates within the stream. This sort of makes sense anyway.

There will still be edge cases, but ultimately it does not matter as long as most of the time the aggregation looks good.

The collection is essentially for dashboard. It is all about the social networking of people. I would just continue to use our ascent logs and activity logs as is at the crag level. I sort of came to this conclusion and suddenly everything is a lot easier.

If we wanted to aggregate and display streams at a crag level then we could work out which stream an activity is associated with using the 'Stream Collection' table above.

I am in the process of creating this table now.

Notes to self:

scd commented 9 years ago

I have been thinking about the 'with' concept you have discussed above. I think this is best implemented in the 'Stream Collection' table rather than the stream itself. Some people may climb in the morning with one person, and in the afternoon with somebody else. I think I would rather summarise these together (eg Simon climbed 5 routes yesterday at Arapiles with Brendan and Campbell)

Notes to self:

scd commented 9 years ago

I have been playing with the verb phrases and have come up with this list from our activity list :

Name Verb Object Without Noun Article Noun Plural
ticked route ticked 1 route routes
documented route documented the route routes
documented crags documented the crag crags
documented area documented the area areas
annotated area annotated the area areas
updated route updated the route routes
updated area updated the area areas
followed climber followed 1 climber climbers
set route set the route routes
started discussion started a discussion discussions
replied to discussion replied to a discussion discussions
uploaded photo uploaded a photo photos
removed photo removed a photo photos
embedded resource embedded a resource resources
created topo created a topo topos
updated topo updated a topo topos
favorited area favorited 1 area areas
moved area moved the area areas
moved route moved the route routes
merged area merged the area areas
merged route merged the route routes
removed area removed the area areas
removed route removed the route routes
renamed area renamed 1 area areas
renamed routes renamed 1 route routes
regraded routes regraded 1 route routes
archived route archived 1 route routes
archived route unarchived 1 route routes

Note that discussions and following need to be added to the activity item list, because they are not currently being tracked as activities.

Also what about the scenario where somebody creates a route then 10 minutes later edits it. In the model above it would be two streams, however I think we should be smart enough to add the update route to the 'create route' stream if it is on the same day.

scd commented 9 years ago

The point of this stream idea is to aggregate as much as makes sense.

I am thinking that things like tagging, updating descriptions, history and location we can just call 'updated' and things like reparent, resequence deleteing, renaming we can call 'reorganised'.

This will result in the following steams phrase grouping for area contributions:

Similar for routes

I think the verb 'documented' is the right word for creating an area/route on our site.

The verb 'setting' is different from 'documented' because the route setter created the physical route in the gym and documented it.

scd commented 9 years ago

Sorry about this discussion brain dump, but issues tease themselves out as I start to put specifics together.

If, in one sitting, you create an area then routes under the area, then you start adding history and locations then this should all be under one stream. So I think we need the concept of a parent stream for logging purposes. In other words if I create a route the system checks parent streams before it creates a new stream.

Putting this all together I am now getting a configuration that looks like this

Name Verb Parent Stream Requires Node Article Noun Plural
documented crag documented 1 the crag crags
documented area documented documented crags 1 the area areas
updated area updated documented area 1 the area areas
reorganised area reogranised updated area 1 the area areas
favorited area favorited 1 area areas
set route set updated area 1 the route routes
documented route documented set route 1 the route routes
updated route updated documented route 1 the route routes
reorganised route reogranised updated route 1 the route routes
ticked route ticked 1 route routes
started discussion started 1 a discussion discussions
contributed to discussion contributed started discussion 1 a discussion discussions
uploaded photo uploaded 1 a photo photos
embedded resource embedded 1 a resource resources
made topo made updated area 1 a topo topos
followed climber followed climber climbers
brendanheywood commented 9 years ago

ok my first thing would be to not try and automatically generate a sentence, the collection of data and the display of that should be quite separate. I see it working with a very small number of verb phrases, probably just [tick, update, follow, favorite, upload, discuss, climber (see very end)] and those are simply a key for grouping, they won't even be directly displayed. This avoids really clunky language like 'documented' as a one-size-fits-none approach. The aggregation would work something like this, in rough pseudo code with dates omitted:

I sit down and create 5 new routes in a batch on Monday in Araps, so this creates one new stream event item:

{
  verb: 'update',
  node: araps
  data: {
    newRoutes: [ 123,234,345,456,567]
  }
}

Then I tick 3 of those routes which results in another event

{
  verb: 'update',
  node: araps
  data: {
    tick: [ 123,234,345]
  }
}

An hour later I then I edit 1 of the same routes I just added, and 6 other routes, and then also added a topo. Because it's the same day and same verb it appends data to the same event.

{
  verb: 'update',
  node: araps
  data: {
    newRoutes: [ 123,234,345,456,567],
    updatedRoutes: [123,101,102,103,104,105,106],
    newTopo: [12345]
  }
}

This is all that probably needs to be done at collection / aggregation time. Then later when it is displayed that chunk just gets passed to a template with each id inflated into it's atom data. The template can then make some nuanced decisions about what language is displayed, ie if there is only newRoutes then say 'Brendan added 5 new routes including X and Y', or if there is both updated routes and new routes then 'Brendan updates 10 routes including 2 news ones: x and Y'. This will give a massively superior sentence structure than trying to just craft one from pieces of data. We can get away with this a bit better in the old feeds because each item was very simple. Also note that if I did 10 edits to the same route this will be exactly equivalent to a single edit as the data is only showing what was touched and not a list of transactions. So over time an 'update' event can change dramatically depending on what else happens to it.

Most of the other verbs are quite simple and the data will be just a list. The only other one with more complex behaviour is the 'discuss' event which could start out as:

{
  verb: 'discuss',
  node: araps,
  with: scd
  data: {
    title: 'lost shoes near bard'
  }
}

after template => 'Simon posted in Araps: Lost shoes near bard'

but then an hour later evolves into:

{
  verb: 'discuss',
  node: araps,
  with: scd, ccgome, fred,
  data: {
    title: 'lost shoes'
    replies: [123,234,345]
  }
}

after template => 'Simon and 5 others are discussing 'Lost shoes near bard' in Araps

The template I'm hoping is pure runtime, the data behind it could be aggressively cached in needed, but leaving it until runtime to render means we can personalize it like:

after template => 'You and 5 others are discussing 'Lost shoes near bard' in Araps

or 'Brendan started following You'

Some of the other simple ones like favorites and follows could potentially go from a single to multiple, but I think are such relatively rare events that they are worthy items in their own right. The only one that should definitely be grouped is 'upload' when someone uploads a batch of 10 pics.

Most of the sentences I think should be about a paragraph at most, the only exception is ticking. I think there is heaps of value in showing all ticks in full unless the number of ticks is really massive, maybe more than 15 ticks. 15 routes in a day, even boulder problems, is a big day, if it is more then it is very likely retrospective route ticking. We would still have the 'me too' and 'comment' stuff pretty well exactly as we do now next to the routes which are shown.

When it comes to gyms, I think these are all just 'updates' but it is up to the template to turn that into the various updates: 'Chris set 10 new routes' vs 'Chris updated the gym opening hours' or even 'Chris updated the gym opening hours and set 10 new routes'

So the table that you listed above I think maps reasonably well to the names of the keys of the data hash in my examples.

When ever there is a large amount of data that is condensed or summarized, the template should link off to either which ever facet page where you can see all the gory details.

I just thought of another event verb type too: 'climber' which is triggered when they either sign up, change their avatar, update their profile or add a website link etc.

One last thought: edits and updates to a particular crag are generally fairly spread apart and clustered, they typically come through in big batches or very small incremental changes. If two people both make edits to the same crag on the same day, it is very likely that they are communicating and probably where at the crag that day. With that in mind, what do you think about the grouping of edits being purely by TLC and not by TLC+climber? So then an update might look like:

Brendan and Chris updated 40 routes in Araps

? Even if they didn't talk to each other it still makes sense so I don't think it could hurt, and you can still click and drill down into the details.

brendanheywood commented 9 years ago

Few more thoughts:

Going with my model above we'd need something like:

ActivityEventGroup * primary key

And to store the links between this and each tick / update / photo etc:

ActivityEventItems * = foreign key

Now this second table is suspiciously close to the existing ActivityLog table, so I reckon we can just use it, but need to close the gap in particular I don't think it includes ticks at the moment. And we'd need to add the type, subtype, and grouping-date.

This in turn makes we think of edge cases like what happens to old events after a reparent which changes the tlc, do we care?

This in turn makes me question whether we now actually need the ActivityEventGroup table, if we add the extra grouping columns into the ActivityLog and group by on them then perhaps we don't need it after all. The only extra field at the moment is the updated date, which would just be a max(lastmod). One less join. I know you mentioned this at the start but I couldn't see how it could be done as-is, but with the extra columns I think this would work.

The other difference between a big group-by and the table above is that multiple links are represented, ie 10 edits to 1 route would be 10 rows not 1, but this too could also be grouped-by and condensed back to one row. The inital raw sql results are gonna be quite small, and then after go off and fetch the atom details as needed.

For the couple of types of updates which don't belong to a node, like following and updating profile, I guess they could just be attached to the world node? or just null?

brendanheywood commented 9 years ago

Oh also, if we do reuse the ActivityLog table this will impact the existing atom feed, in particular poluting the update feeds with tick data. We can easily work around this and filter it back out, but if this is all working well it would be better to just redo the atom feeds so they display the exact same stuff as well. This may mean that not all of the atom feeds make sense anymore, and it would be good to audit what actually gets used. It may be we can just scrap a few of them and consolidate all into just a single public feed about a node, and a single public feed about a person.

also quick look at log to see things that look like real atom clients:

grep feed access.log | grep -v Mozill | grep -v java | cut --delim=' ' -f 12-14| cut -c1-30 | sort | uniq -c | sort -nr

    764 "Feedly/1.0 (+http://www.feedl
    618 "Feedfetcher-Google; (+http://
    301 "Apple-PubSub/65.28"
    182 "Sogou web spider/4.0(+http://
     70 "Apple-PubSub/65.23"
     31 "Apple-PubSub/65.21"
     28 "Opera/9.80 (Windows NT
     25 "Opera/9.80 (iPhone; Opera
     19 "-"
     15 "Opera/9.80 (X11; Linux
     11 "SimplePie/1.3.1 (Feed Parser;
     10 "FeedlyBot/1.0 (http://feedly.
      9 "Opera/9.80 (Android; Opera
      7 "Feedspot http://www.feedspot.
      6 "Windows-RSS-Platform/2.0 (IE 
      6 "UCWEB/2.0 (Linux; U;
      6 "Scrapy/0.24.1 (+http://scrapy
      6 "iTunes/10.7 Downcast/2.8.18.1
      3 "Windows-RSS-Platform/2.0 (MSI
      3 "Opera/9.80 (J2ME/MIDP; Opera
      3 "AppleSyndication/56.1"
      2 "OSSProxy 1.3.337.341 (Build
      2 "Opera/9.80 (SpreadTrum; Opera
      1 "SAMSUNG-GT-E2202 Opera/9.80 (
      1 "Opera/9.80 (Series 60;
      1 "Nokia5130c-2/2.0 (07.95) Prof
      1 "facebookexternalhit/1.1 (+htt
      1 "com.apple.Safari.WebFeedParse
scd commented 9 years ago

Ticks are in the activity log already and that filter you mentioned is already in place.

The Node is mandatory in ActivityLog, so we would have to relax that to allow for non-index updates, my gut feeling is not to use World node for this.

Given that we are still in experimental mode then we should just modify the ActivityLog table a see if we can push this with SQL.

Can we call 'type' something like 'streamGroup'. Actually I think we can infer both type and subtype from the existing activity item field - I will have to audit this.

I prefer a smaller list of types (streamGroups) so I am happy with your shortlist. If you rollup all the index update types to just 'update' then some issues I came across just disappear.

Discussions are not in the activity log, so we would have to put it in. I think we should only put public discussions in this log, private discussions should never go in.

I think we need to add grouping-date as a discrete mysql date. Currently activity log uses a continuous time variable, which is no good for grouping performance.

Activity changes information. We should make sure we present the original information in the historical logs. The activity items store a pre and post update state where applicable.

All tables have a lastmod field. The activity log currently does not have it's lastmod date updated, by design. (I think it is set to null when created, but the record is never updated so this field is never updated). However there are some activities that we would want to bring back to the top of the list after certain user actions (eg comment on an ascent should bring that stream grouping back to the top).

brendanheywood commented 9 years ago

the only difference between doing this without the extra table and join is that if I am in say Area B looking at the feed, I'll get a pseudo event which only shows the updates inside B, but if I go up a level I'll get a different pseudo event which would include A and B's updates. I'm fine with this either way.

The crucial important different between this situation, and what we have now,if we simply grouped it at say the JS or template level, is that two batches of interleaved ticks would get correctly grouped.

scd commented 9 years ago

Thinking about how this is displayed in the dashboard. Are there two different types of groupings that we want to present.

  1. Activity by all your friends, grouped by accounts
  2. Activity associated with all TLCs, grouped by crag
brendanheywood commented 9 years ago

Both these grouping are the same, because both groupings are by account+tlc+day. The only edge case where these would different is where I tick in multiple crags in a day and I'm happy for that to be shown twice.

One edge case with the 'updates' is how would it work if we move 10 routes from TLC1 to TLC2, would we want this to create 2 pseudo stream events?

Also I've been trying to figure out how we can do this without querying ActivityLog, grouping, ordering, and then joining back to ActivityLog again to get the details again

brendanheywood commented 9 years ago

I think we can do the initial raw query without re-join back onto ActivityLog by using the GROUP_CONCAT function to return the list of affected node ids. Needs some testing, anyway I got to get to work

scd commented 9 years ago

I have been starting to play with a query to make sure I understand the issues here. GROUP_CONCAT works exactly like we want it to (it also allows DISTINCT).

I think that we talked about this not being a standard facet search query. I concur with that after a little bit of investigation.

We cannot use a limit based on number of activity records, but rather have to get whole days at a time (we may use a limit for the aggregated group by result). Note that a partial day may lead to inconsistent data (in other words the query could report 'Campbell updated 5 routes in Arapiles' when he actually updated 10.

This means we have to use a date as the way of limiting the query. But how do we pick a date.

The next part of the discussion is focusing on the dashboard functionality of this list, not the crag summary, as I think the crag summary is fairly simple.

I think we always want to show a populated activity feed in a users dashboard. This means that we cannot just base this on a user friends. I think we also want to automate this as much as possible.

Thinking about 4 use cases:

I think we should be able to automate a feed for all these use cases, without the user having to configure anything. Furthermore we should be able to balance these so that if somebody has only one friend with one update yesterday, we should also show them info from lower level feeds.

I am thinking of defining a UNION of queries. Selecting as many queries as we need to make the query full of information.

I am following a lot of accounts. I am more interested in the fact that Campbell did 5 ascents last week than some random person yesterday. But we don't know how many days to go back to get interesting information from friends.

What about having a controller, which selects the query to use for each persons dashboard?

Do we want the user to be able to page back into the history of this feed?

brendanheywood commented 9 years ago

We cannot use a limit based on number of activity records, but rather have to get whole days at a time

Yup, lesson learnt from the atom feeds and facets. I think we just pick sensible defaults based on rough heuristics. Most people climb (and hence log in) around once a week or fortnight, so a person's feed should be say 2 weeks or 1 month. Where we expect large amounts of data, say at a region or country level, and where we have a prebaked stat around ticks last month which is known to be high, we could scale this down to less. Note this is only at the original raw query but we'd also limit this between the raw query and inflating the atom data. If it goes over a limit then we'd pass a 'nextDateFrom' to the template which would render a link where people can load more.

I think we always want to show a populated activity feed in a users dashboard. This means that we cannot just base this on a user friends. I think we also want to automate this as much as possible.

There are two things the balance, 1) is showing people what they want by mind reading, 2) encouraging them to configure it explicitly, via following people, and faving crags, and where needs extra settings it so that we can do 1)

I think where people don't have friends, or don't have favorites, we focus on leading them through that process and not just showing them something meaningless. ie if the have no fav's then prompt them for their top 3 favorite crags on the spot inline. if they have no friends, and they do have fav's, then ask them if they know some of the people who also climb at those crags. (but this stuff can be round 3 features)

Furthermore we should be able to balance these so that if somebody has only one friend with one update yesterday, we should also show them info from lower level feeds.

There is a risk of automatically adding more stuff is that is will potentially hide what they actually want. Eg I Campbell who ticked a week ago, but then we also pull in extra data which is more recent, so the only thing I actually want to see is now lost down the stream. We could go down the route of no longer sorting by date, but by relevance but this is a slippery slope, not only is it hard, but I don't really like it personally when facebook suppresses stuff.

What about having a controller, which selects the query to use for each persons dashboard?

Yes, I thought we'd already talked about this. My comment second comment on this thread:

Then in their settings page they would have a matrix of tick box grid for each type action down the left, and across the top 'Me | Friends' + 'Fav crags | My country | World' (I'll do a mockup if / when we need to) to specify what's in or out.

So a mockup is needed....

image

I an argue the case for these being checkboxes, but I think it's simpler to represent it as a scale from less on the left to more on the right, maybe something even like a volume control with an angled triangle. Lie this but way slimmer and simpler

image

Do we want the user to be able to page back into the history of this feed?

Yes. But can be round 2, but the basic mechanics of going backwards should be exactly the same as displaying the last week, except in stead of asking for (now..now- 1 week) you'd ask for a different range. The processing should be identical.

My thoughts generally around all this is to KISS and then build it up only as needed. ie start purely as a union of exactly what they follow and fav, and not worry about default feeds yet. One thing that really shits me about facebook is that the stream has become progressively less and less deterministic over time, there is more stuff my friends post that I never see, and more shit I don't want to see ending up in the feed. I don't want to go down that route.

Another edge cases to thrash out: I think forums will be a special case because out of all of the events these are the only ones that really get updated and so get moved up the feed across a day boundary. Other events like ticks and updates would shuffle up if they get more ticks or updates, but only up until a day boundary, but a discussion could keep on bubbling up every time it gets a new post, so I think the query for this will be someone more different than the other. We are never grouping on replies (like we would with ticks / updates), only every querying the discussion row itself so I think this is easy.

Also when we do the query and union, one thing I feel is quite important is telling the user why it is in their feed, and giving them the option to remove it (even if initially that is simply a link to their feed settings). By this I only mean stuff like 'This is in your feed because you follow X' -> 'Unfollow X', or 'This is in your feed because you commented on this discussion' -> 'Leave this discussion' and not the more vague facebook style of 'Hide this post' which creates a complex signal to FB to show less of that kind of thing generally. An event could be in your feed for multiple reasons so when we union them we'd tack on an extra static string so we know which query it came from to display.

brendanheywood commented 9 years ago

Also I think not every event verb type should be represented in the settings, less settings the better:

So this is just 'ticks | updates | forums'

scd commented 9 years ago

In respect for your KISS, let's do the checkboxes for version 1 and move on to your slider idea later. Firstly it is the simplest to implement, and secondly I think the checkboxes explain what is going on better.

I am on board with your suggestions. I think the system should select good defaults (ie worldwide for somebody just signed up with a low activity country).

After somebody follows someone for the first time the system ups the default to the follow checkbox.

I think we are pretty close to end-to-end agreement with something that should be implementable without too much effort (maybe a day or two or three of four).

brendanheywood commented 9 years ago

Just did the wife test: yes let's go with checkboxes :)

Sent from my iPhone

On 18/10/2014, at 8:04 PM, Simon Dale notifications@github.com wrote:

In respect for your KISS, let's do the checkboxes for version 1 and move on to your slider idea later. Firstly it is the simplest to implement, and secondly I think the checkboxes explain what is going on better.

I am on board with your suggestions. I think the system should select good defaults (ie worldwide for somebody just signed up with a low activity country).

After somebody follows someone for the first time the system ups the default to the follow checkbox.

I think we are pretty close to end-to-end agreement with something that should be implementable without too much effort (maybe a day or two or three of four).

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/1679#issuecomment-59604183.

scd commented 9 years ago

Respect the wife test :)

scd commented 9 years ago

I don't like 'climber updates' because of lack of symmetry. I am thinking of breaking this into two categories:

I think joins is separate, and we should be able to do a lot with this in it's own right. eg "Simon joins thecrag community in Australia"

When somebody joins our community they do not yet have anybody following them, nor are they followed by them, so I think it is a distinct case.

I think we are then using the term 'climber updates' for all the rest of the account settings changes. I think this could just be 'customise'.

How do you feel about the 'supports' verb group when somebody becomes a premium member? I think we should just leave this out until we get further with that side of the site.

scd commented 9 years ago

Now I am thinking further about your 'subType' concept a little deeper (eg newRoute, updateRoute, newTopo).

I think that this fits in nicely with the CRUD concept (create read update delete) in relation to facet items.

I think this glues together a lot of things, joining a number of independently developed concepts fairly well (facets, low level activity types, and this new activity stream aggregation). They map well despite being developed without consideration of the different concepts, and where it does not map then I think it is a logic problem on our part. CRUD methodology is a rigorous development methodology, so this gives me encouragement that this is the right direction.

Assuming that this is the way forward it highlights a number of deficiencies:

brendanheywood commented 9 years ago

I think we are then using the term 'climber updates' for all the rest of the account settings changes. I think this could just be 'customise'.

The term we use internally is kinda irrelevant so pick whatever you want. The event text would have something like 'Brendan updated their profile'. I'm happy to split those this from 'join', although I'm not really sure where the 'join' would even appear? The only place I can see showing a 'join' event is at the very start of someone's personal feed if you scrolled all the way down to the bottom, but I don't think that even needs to be a real event. It would never appear in say a crag's feed, maybe the country one at a stretch. I'm not opposed to it being a real event, just wondering if I've missed other places where it should go?

I think that this fits in nicely with the CRUD concept

gold

The only things I can't see explicitly in that list are re-sequence and move, A resequence should just be an update to the containing area, a batch move could be a single 'update' event to the highest common node between the source and destination, or possibly two events if it crosses a TLC boundary, but I'm not super fussed about modelling the latter case perfectly as I can see it being messy and it's very rare.

is a newTopo an upload or an update?

If you link another route or area then it is updateTopo. If you happen to upload a photo then newPhoto, and then a minute later turn it into a topo newTopo then this is two events. However ideally this should never happen, see #346. Or do you mean the settings groupings? In which case a topo is definitely an update. I'd imagine something like:

"Brendan updated Araps including 5 new routes and 2 new topos:..." and having a small version of the topos directly in the feed, or at very worse a deep link to the topo.

can the newList, updateList sub types be in the customise stream or update stream?

I'd probably bundle List events with newAscent events, that's how they are done now (for better or for worse). I'm happy to defer the decisions on this til we tackle #1528.

scd commented 9 years ago

I have just added some fields to ActivityLog table and am in the process of fixing the historical records (700k records). I have also done some indicative tests to make sure we can get what we want with reasonable performance.

I have added the following fields: StreamDate: MySQL date field (ie no time component). Based on create date, but can be updated later if want this record to be associated with a more recent update (eg comment on an ascent). I am not sure if we want to do this or not. Note that an activity log record can only be associated with one stream date. Stream: one of update, tick, favorite, etc (actually an id pointing to the category record) StreamCRUD: create,update,delete StreamObjectType: Route,Area,Ascent,etc Object: the id of the facet item associated with StreamObjectType. StreamCrag: TLC for activity

I have got two test sql scripts activity-stream-account.sql and activity-stream-crag.sql which will probably end up being two components in the final UNION sql for a users dashboard.

Not yet in repo because I need to rebuild dev.

In these test queries I group by date,account,stream concatenating distinct 'crud:objecttype:object' records and group by date,crag,stream concatenating distinct 'account:crud:objecttype:object'. From this I can build the template data as you want. We will have to discuss how much I expand.

While not blindingly fast the performance is reasonable on my dev (one day's data is about 60ms and a month about 200ms). This is fast enough to trial the service and iron out all the issues. If later we need to create another aggregation table we can.

We should deliver this to the dashboard as an embed (c.f. to embed in the facets), so the dashboard page itself is fast. This means that a persons dashboard feed should have it's own url. /climber/scd/activity - combined feed as per account settings /climber/scd/activity/self /climber/scd/activity/friends /climber/scd/activity/self+friends /climber/scd/activity/favorites /climber/activity/world

Note that the embed would work as for facets: /climber/scd/activity/self/embed

We have got an activity facet url, but we don't use it. I think we should use this to expand the grouped summary in the dashboard feed. eg /activity/by/cgome/stream/tick/on/2014-10-23 /climbing/australia/arapiles/activity/stream/update/on/2014-10-23

brendanheywood commented 9 years ago

Ok sounds good. Note that the activity feed was added to the UI in #1179, but yes it was the intention to link into this for more details for the 'update' types, perfect

Also a privacy question, should this:

/climber/scd/activity

really be something like

/climber/me/activity

or maybe something completely different like

/dashboard/activity

instead? the difference being that climber A can't see the personal feed for climber B, with the latter, but with the former they could. "/climber/scd/activity" to me is the feed of what that person has done themselves, ie what would show on their profile when someone else looks at it, and not what they see on their dashboard of events from the rest of the world.

brendanheywood commented 9 years ago

We will have to discuss how much I expand.

I think at very minimum expand out the climber node and the TLC node. It would be nice if besides the TLC node we also had a concept of the common ancestor of all updates which might be just a boulder, but I can't see how we'd get this using the work in progress architecture. Beyond that what data I'd want is highly dependant on the shape of the data.

eg lets say we have update:new-route x6 + update:update-route x3, I'd probably want to show something like:

Brendan updated 3 routes in Araps and added 6 new routes to Bushranger Bluff:

So in this case I only care about the new routes. In other cases I might want the updated routes because that's all there is. If there was 200 new routes, I'd only want the details for the first 5-10 and then link off to the activity facet for more details.

I think I can work around the 'common ancestor' problem by just using careful copy. Ie I could have updated 100 routes in many cliffs across araps but the event would read like:

Brendan updated 100 routes in Araps including 8 new routes to Bushranger Bluff:

It's finding Bushranger Bluff purely as the common ancestor of the routes that are actually shown, not the common one of all those added or updated.

All of the hairy logic I think only applies to the 'update' ones, all the others like follow and fav and forum I think we could nut out sensible data for which is always the same. But in the update case I think it might be easier if I just pulled data on demand, the question is whether this logic is in the back end of in the templates.

Maybe a completely different approach could work: simply inflate the date for the first 10 records in each grouping, so I'd get data something like:

event:
  date: 2014-10-24
  tlc: { atom }
  who: {atomClimber }
  groups: {
    topo: {
      inflated: {
        12345: { atomTopo },
        12346: { atomTopo },
        12347: { atomTopo }
      },
      create: [12345,12346,12347,12348,12349 ],
      update: [12345,12346,12347,12348,12349 ]
    },
    route: {
      inflated: {
        12345: { atomRoute },
        12346: { atomRoute },
        12347: { atomRoute }
      },
      create: [12345,12346,12347,12348,12349 ],
      update: [12345,12346,12347,12348,12349 ]
    }
  }
}

A couple thing to note:

More thoughts:

Also re performance. Now that we've got a pretty solid understanding of how all the grouping works, there is probably a bit of optimizing we can do, but if later we hit a hard limit then a potential next step could be to turn this into a psuedo materialised view with pre-grouped data. This effectively brings back the event table we discussed and dismissed as not needed from a data model point of view. Implementing this later would not need much discussion as the content would be identical and we wouldn't need to think much about it, other than keeping it's data current.

scd commented 9 years ago

Rather then activity I think we should call it a 'stream' in the url. This means we are not overlapping with our activity / feed terminology.

I had the same thoughts about permissions of streams this morning - I think we should treat it exactly as we do settings. If you are logged in then: dev.thecrag.com/stream If you are admin then you can access account streams dev.thecrag.com/climber/scd/stream If you want to look at a public account stream for their updates only dev.thecrag.com/climber/scd/stream/self (if you are not admin then omitting '/self' will be equivalent.)

Your inflate concept is perfect.

Do you the template or the backend to optimise stuff out. It sounds reasonable to do it on the backend, in that way the template knows that they are all good events.

Agreed about your performance thoughts. For now we play.

brendanheywood commented 9 years ago

I'm cool with stream and urls. Do the optimise backend, but before you do you reckon it's even worth it? If we're only going to inflate 10 records at worse case we might load up 40 so it's probably not that big a deal. It's only memory size between the backend and mason

scd commented 9 years ago

I have some data for you to play with in a template. Pull from repo and go to /stream

It is more or less in the structure you suggested. Because we have unique ids across all data types I have done the inflate at a higher level. This is easier, but also means I only have to inflate once for the whole page, even if a route appears in several events.

The particular example is just 'friends' stream. But I have also got the 'self', 'watch' (this is favorites) and 'world' streams working in sql. I think we just need the friends stream for now, while we discuss and develop the template.

One thing to note the crag is not on the friends stream. If Campbell does 10 ascents in one day across two crags, is this one event or two events. Our initial discussion implied one event (and what I have implemented) but i think the symmetry of the whole thing means we should separate it into two events.

brendanheywood commented 9 years ago

cool I'll have a poke around

I thought we'd discussed it as two events? unique tlc + account + day

scd commented 9 years ago

Also the template data is paged (20 events per page). Let me know the maximum number of events you want to display on one page.

It's actually a double limit - the inner query is between dates with unlimited results, and the outer group by query has a limit on the number of records.

brendanheywood commented 9 years ago

How do we page for the page size is hit? It this pure date sort based on the max( timestamp) of each update component? I'd like to get as part of the template data enough info to get the next query, which I'd assume would be date filter rather than just a page param

the actual limits we'll tune once we have something more real working

scd commented 9 years ago

Have a play with the way it is now, then confirm with me what you want. If you are already definite about splitting friends events on TLC then let me know.

brendanheywood commented 9 years ago

Yeah from a UX point of view I think seeing both has more value, and from a technical one I think it will make things simpler and more consistent, especially if we later go down the materialized view way were these groups are being read from multiple places so the save group could be used in a TLC feed or a freind feed and be the same grain size.

scd commented 9 years ago

Also while we are discussing it. Is two people updating one crag, one event or two. I have currently done it as one event.

brendanheywood commented 9 years ago

I think lets just stick with "unique tlc + account + day" until we get a good cohort of real data and then make that call. ie release this as a secret feature and use it internally to watch stuff and tweak it for a few releases before we swap over

scd commented 9 years ago

What do you want to do with index updates above top level crag (eg favorite Australia, or update location for Queensland).

brendanheywood commented 9 years ago

Favorite should always be the fav node, not its the tlc.

Forums should always be the forum node and not the tlc.

Updates to regions and above I think aggregate into the country node instead of the tlc. eg "Brendan moved stuff around in Australia". updates above country I think we can ignore they are so rare

brendanheywood commented 9 years ago

so my comment above becomes "unique node + account + day" where node is usually a tlc, but could be any node or null in the case of follow etc

brendanheywood commented 9 years ago

Is the order of the ids within the group_concat? In particular ascent order isn't the same as elsewhere which needs to be fixed, assume this applies to other types as well

scd commented 9 years ago

I have just updated the sql so that an event is date+account+node+type. In repo.

group concat is now ordered by create date asc. I have not confirmed the output. What is the main problem with the group concat being in a different order? I think ordering by ids is not right, in particular when it comes to updates.

Love your first ui version. I have increased the default number of events for testing to 50.

brendanheywood commented 9 years ago

The issue for me is that I want to see my ticks in order I tried / sent them each day, and more importantly the order should be consistent across the site (profile recent ticks, facet order by date, event stream). One challenge is that in recent ticks and facet, they are order by descending order, the reverse of what I did, but in the stream event I'd probably expect them in the time order going forward, but the events going in backwards time order.

I personally rarely use the bulk tick form, I tick each route 1 by 1 in the order that I do them in, and often might have gone back and retried a route so it gets another tick in the right order. I know this seems fairly pedantic but it matters to me especially for projects and I've seen a few other power tickers like lee do this too. Ultimately I think we need to fix the ticking form so that we can reorder ticks, and duplicate ticks (not just add more tries) . I thought I'd logged this but now can't find it. I'll make another issue.

Also more data template stuff:

scd commented 9 years ago

More updates:

A couple of further thoughts on our framework.

  1. GM TIME

Date categories are based on gm time. This may be confusing for the user. Using gmdates as the aggregation date makes the backend sql much simpler but possibly makes the ui more confusing. I think we should discuss our way through this issue now before we go much further with the implementation.

The user has an time zone setup against their account, so I could pass this into the template. Your date stream output could use the latestUpdate field + offset to report the date, rather than the date field. In otherwords the group by is done using gmdate but reporting is done by end-user local time. This would lead to edge cases where events which look like they should be one are split into two, and vice versa. I think we can live with this and if necessary the backend can do some post sql processing to detect and correct.

If we want to do this then I will also have to pass the users timezone offset in the account template data.

  1. LOG TIME NOT ACTIVITY TIME

The stream date for ascents is based on their logged date not the date the ascent was done. I think we can live with this, but maybe we should report when the ascent was actually done. We want to encourage same day ticking anyway (I know this will not always be possible).

  1. FAVORITES AND TLC

The SQL logic joins favorites with TLCs associated with the activity. That is fine if you have favorited all TLCs, however if you have favorited Australia or Bard Buttress, these will do nothing. I am pretty keen to avoid additional joins and want to keep the SQL as simple as possible for this. I am proposing one of two radical ways of dealing with this:

Maybe there is a definitional issue, you can favorite anything, but watch crags.

  1. NODES VERSUS TLC

You mentioned that favorites should be on the node that somebody favorited, not the TLC. I can live with that. Because people will mostly be interested only in what their friends are favoriting, I don't think it makes much difference either way, it is just a matter of how the information is aggregated and reported.

However discussions is more of an issue. Imagine a scenario where somebody starts a discussion at Bard Buttress and you have favorited Arapiles. If the discussion activity log is recorded against the TLC (ie Arapiles) then you will see it in your watch feed because you have favorited Arapiles. (Note that I am only talking about the new Crag field for stream aggregation). However if we record this activity against the Bard Buttress for stream aggregation then you will not see it because you have not favorited Bard Buttress. I know that Bard Buttress is under Arapiles, but we would need an additional join to get this.

I am thinking that we should keep this paradigm as simple as possible and aggregate against top level crag (or gym) for all types at the SQL level.

I am happy for the post SQL backend processing to break up aggregated 'favorites' and 'discussions' into separate events. Don't forget that the actual activity will be expanded.

BTW, I am pretty pleased with how this has progressed. I just want to get it far enough that we are happy with the new SQL fields before I release.

brendanheywood commented 9 years ago

why: one of self, follow, watch or world

Do we also want country as an option?

LOG TIME NOT ACTIVITY TIME

We've nutted this problem out before, and previously worked out that the correct way to implement this is to use the timezone of the TLC and not the account, but we've never actually implemented this anywhere. Doing this with the group-by would be another join to location timezone data which I'm not sure we even have. So I'm happy to live without this now, but probably much easier to solve if we do turn this into a materialized view. Note that this will actually be a bigger problem for our core users in australia because the GMT midnight cuts our day in half, and less of a problem closer to greenwich.

FAVORITES AND TLC

I'm pretty unsatisfied with both of those options, they are either introduce voodoo or are a big step back in usability. We've already been through this problem and solved it fairly well with the facets and the N0-N9 indexes. So I think we need to do this properly.

I can't find the original issue, but the where clause would end up something like

select blah from ActivityLog
where  N2 in (ausid)
 OR N4 in (tas)
OR  N5 in (beulah, gara, ebor)  

And I think you already wrote some thing which removes redundant where's, like it would strip out all the where clauses for nodes inside australia because I've favorited Australia.

NODES VERSUS TLC

See above. All this is extra voodoo which I want to avoid as much as possible. Discussions are a special case, and do not need any aggregation at all. The use case of 'Brendan started 3 new discussions in Araps' is just not worth considering. We should just re-use the existing logic / sql and union it into the rest of this. So if the discussions you see in a stream for a month, would be exactly the same forums you'd see in the forums tab under that particular node for that month.

So we don't need any post SQL processing to de-aggregate them, we just shouldn't aggregate them in the first place.