theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
110 stars 8 forks source link

Recent contributions RSS filtering #2009

Closed leemcdougall closed 9 years ago

leemcdougall commented 9 years ago

I use the RSS feed to keep me updated on new routes and crags in my favorite climbing areas, however there is a lot of clutter. It would be great, for example, if the RSS could be filtered to only show "New routes" as I don't really care too much about being alerted for reparenting, merging etc. Or maybe it can be consolidated into one entry like the Stream does. Lee

brendanheywood commented 9 years ago

hi Lee, the streams has been our big focus internally with the intention to replace completely the content inside the rss feeds. What areas are you currently watchig via rss and we'll give you a sneak peak so you can give us feedback?

leemcdougall commented 9 years ago

Currently, recent updates for Sydney Metro (http://www.thecrag.com/feed/activity/node/11741323) It would be great to have: "Recent New Routes, Topos, Photos in Sydney Metro" I used to follow Blue Mountains but that just flooded me with too many edits. Would be happy to provide feedback :)

brendanheywood commented 9 years ago

I the follow the world and monitor every update, so trust me I get the 'drinking from the hirehose' thing :)

That said the streams is much better as it groups large batches of edits and summarises them if they are too big. At the moment all types of edits are grouped together

scd commented 9 years ago

We could allow people to filter events futher so that only events with matching activity types come through. This will have a big negative performance impact so I don't think I want to offer this until we know we need it.

leemcdougall commented 9 years ago

can we get an RSS of the Stream maybe?

scd commented 9 years ago

I think we should minimise the code we have to support. The activity feeds should be replaced by a stream rss. Maybe we keep support for the activity rss for a release or two then cutover to the stream rss.

Stream RSS

We are not going to be able to deliver topos over RSS. RSS does not do javascript and our static image topos don't work in diff mode. Also I don't think we want description diffs in the RSS feed.

So I'm thinking that we make the assumption that Stream RSS feed is always in summary mode and links to the stream event page if specific details are needed. This assumption means that you will not see the routes that were added, but rather a summary that X routes were added.

A big part of streams is the ability to make comments on a stream event. RSS feeds restict this model, so I prefer people to use our website rather than rely on just the RSS feeds. So I think the assumption of just doing summary mode aligns with our business model.

Having RSS feeds being just summary mode also makes implementation significantly easier.

brendanheywood commented 9 years ago

My intentions would be that the existing atom streams would just graceful start showing stream content instead of the existing stuff. Existing subscriptions should just continue working, so an 'update feed for blue mountains' would just show the blue mountains stream filtered to just update events etc.

scd commented 9 years ago

Do you want a feed which does everything?

How do new people subscribe to the feeds?

Personally I don't use the feeds and I think I prefer people to come to the website. So I am not a good champion for RSS feeds.

brendanheywood commented 9 years ago

I use the feeds more than the website, they are my primary way of monitoring the site. That said I'm a special case and the streams once live would be better and I may swap over purely so I can see the topos inline.

The main loss of functionality for with using the website is that there is no concept of 'read' status for an event, or of '100 new events' to go catch up on. I check every few days and typically have a ~500 - 1000+ records to scan through. Using the grouped stream events should bring this down to a few dozen per day max.

Looking at the logs there is quite a lot of access, even after you strip out our internal use of the html version and the json feeds on the dashboard there is still hundreds of hits and importantly hundreds of hits for different feed urls:

# grep  feed access.log | grep -v markupType | grep -v json  | cut -f 8 -d' ' | sort | uniq -c | sort -gr | head 
    306 /feed/activity/node/7546063
    293 /feed/ascents/node/7546063
     97 /feed/activity/shortcuts/11183449
     86 /feed/ascents/friends/331305333
     84 /feed/activity/node/190488567
     79 /feed/ascents/node/689868222
     79 /feed/ascents/node/641227161
     79 /feed/ascents/node/640682745
     79 /feed/ascents/node/627184119
     79 /feed/ascents/node/15220459

If I pick the top feed which is unsurprisingly the world node, and look at the user agents there are quite a lot of different ones being used:

# grep '/feed/activity/node/7546063 ' access.log | cut -d'"' -f 5,6 | sort | uniq -c | sort -gr
     67  "Mozilla/5.0 (compatible; FlipboardRSS/1.1; +http://flipboard.com/browserproxy)
     66  "Apple-PubSub/65.28
     40  "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)
     33  "Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
     33  "Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 1 subscribers; feed-id=947411839999811236)
     31  "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36
     13  "Mozilla/4.0 (compatible; MSIE8.0; Windows NT 6.0) .NET CLR 2.0.50727)
      8  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1
      6  "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36
      4  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
      3  "Apple-PubSub/65.23
      2  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      1  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
      1  "-

and also note that if 100 people are subscribed to a feed using the same service, the service will only call us once so this is probably a much low number than the real number of subscribers.

I wonder how many of these have been setup by people and then forgotten about and the robots keep scraping?

brendanheywood commented 9 years ago

How do new people subscribe to the feeds?

If we do this, then they would subscribe exactly the same way as before, either explicetly using the atom icon which we'd add somewhere, or using the meta link in the page header

scd commented 9 years ago

do we need the first, next and previous links in the rss meta data? There is no concept of this in the server code at the moment. The javascript code manages next by trying the paginateAt event id and if that returns empty goes to the next month. Previous is difficult to work out. I am guessing this is not actually required to meet our current needs.

brendanheywood commented 9 years ago

It doesn't matter as much if in the atom feed it just always points at the next page and they are empty, the feed reader will just keep paging through. Only next is important, prev isn't - or the other way round depending on which way your timeline is pointing. By next I mean older. Practically most of the feeds are the heavy ones anyway which will not be empty normally so won't be a big issue.

scd commented 9 years ago

I think we use 'next' in the existing logic for the rss feed. This would add 1 to the page number, which means get older data.

I have implemented next:

brendanheywood commented 9 years ago

I'm confused. Can't we now get all this perfect with the new month indexing?

Sent from my iPhone

On 04/09/2015, at 9:28 AM, Simon Dale notifications@github.com wrote:

I think we use 'next' in the existing logic for the rss feed. This would add 1 to the page number, which means get older data.

I have implemented next:

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/2009#issuecomment-137600796.

scd commented 9 years ago

??? what do you mean 'all perfect'. I don't know what you are confused about which makes me confused. I think it is all perfect.

In the previous rss implementation there were 'first', 'next' and 'prev' parameters. The 'first' parameter is exactly the same. The 'next' parameter refers to older data, which previously just added 1 to the page number (ie older). This has been changed to use the new paginateAt and/or startDate & endDate parameters as we are doing for the streams on the website.

The paradigm of 'prev' does not exist in our new stream code so I have abandoned it for rss. Your advice implied we don't need it anyway for the rss feed.

Note that the website stream pagination works the same way, except for one minor point. The javascript on the website has some wrapper logic which automatically gets the next window if an empty stream is returned. In RSS land, if a particular month is paginated (using paginateAt) then the last page will be an empty stream, and the 'next' url will be the older month window. My understanding is that will not adversely effect the RSS feed and it will continue to go back.

scd commented 9 years ago

Old way

    <link rel="first" href="http://www.thecrag.com/feed/ascents/climber/9068185?page=1" />
    <link rel="next" href="http://www.thecrag.com/feed/ascents/climber/9068185?page=2&amp;size=50" />

new way

  <link rel="first" href="http://192.168.1.8/feed/ascents/climber/9068185" />
  <link rel="next" href="http://192.168.1.8/feed/ascents/climber/9068185?paginateAt=366009016" />
brendanheywood commented 9 years ago

Sorry my bad, all good

:+1:

scd commented 9 years ago

No worries. This is almost finished. You will have to wait until I give you another version of dev (maybe tonight). I will let you know.

The one remaining task is to validate feeds.

Test urls:

http://dev.thecrag.com/feed/ascents/climber/9068185 http://dev.thecrag.com/feed/activity/node/7546063

scd commented 9 years ago

Validating using http://feedvalidator.org/check.cgi

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

    *

      line 2, column 0: Use of unknown namespace: http://activitystrea.ms/spec/1.0/ [help]

          <feed xmlns="http://www.w3.org/2005/Atom"

    *

      line 49, column 0: content should not contain data-discussionid attribute [help]

          <div class="event-inline-comments" data-discussionid="372498421">

    *

      line 51, column 0: content should not contain data-commentid attribute [help]

          <div class="comment  arrow" id="m372498421" data-commentid="372498421">
scd commented 9 years ago

I recon the feeds should be able to handle a data attribute and the namespace is the same as it has been. @brendanheywood unless you find anything else I am closing

brendanheywood commented 9 years ago

Yup all good

brendanheywood commented 9 years ago

@leemcdougall what's your thoughts on the new streams now?