ropensci / unconf15

rOpenSci's San Francisco hackathon/unconf 2015
http://unconf.ropensci.org
36 stars 7 forks source link

Better Blog Aggregation #9

Open eddelbuettel opened 9 years ago

eddelbuettel commented 9 years ago

A few of us have exchanged comments or notes about the need for a better aggregation of blog activity for the R community: higher quality, proper formatting, advertisement-free, possibly curated, ... Might be worthwhile to stick our heads together while we are in one place.

karthik commented 9 years ago

I didn't have the bandwidth to respond to that email thread but I'll upvote this here (and better suited for an inperson discussion anyway). I think we can easily put together a minimalist, and readable ad-free aggregator.

Instead of manually adding blogs (and sometimes waiting for months for a single lax maintainer to respond) we could automate some of this. We could easily ask users to place text file on their server with boilerplate text granting permission for us to aggregate (then letting us know via a form). As long as the file is there, we continue to aggregate and can stop anytime they pull the plug.

If we do this, it would be great to improve upon the tagging so one does not have to endure every mundane R post (or perhaps only keep tabs on say Shiny posts).

@ironholds has offered to host for us.

eddelbuettel commented 9 years ago

Nice idea re the poke for permissions file and continue aggregation while present. I like that.

Pinging @elijah who had poked around GitHub as well. Also as a footnote to myself as I keep forgetting that it was rawdog which I like as a simple (rss in, static files out) aggregator with plugins.

Ironholds commented 9 years ago

+1 to...well, everything Karthik said. I don't have the time bandwidth, but I'll happily provide the hosting and cover the costs.

elijah commented 9 years ago

Hey - finally getting around to replying to this thread of thoughts.

A few weeks ago, I was tinkering with https://github.com/planetr/planetr.github.io -- i even got as far as making a planetr project to fiddle with. Happy to add folks to that project so they can commit, or take pull requests, or whatever.

My first pass at it was to just use Jekyll to publish posts pulled down by the planet.rb gem. You can see my experiment from back in December at planetr.github.io.

It should be totally possible for someone to hack on the templates, or add features, etc, and then just make pull requests to the repo, and have other people review and check it in. Want your blog added? Make a pull request.

Dirk likes rawdog, and I get that -- it looks pretty awesome -- you could easily implement this same thing with it, I suspect. [I just haven't tried - a parallel implementation checked into the same or a parallel repo would be fun. :) ]

Democratizing the heck out of this is a thing that should happen. I was a blocker on the old implementation of planetr, and I don't want to be that. :)

What should probably happen is that a fairly regular cron job on a host somewhere should do this kind of thing:

git checkout master (or HEAD...); git pull; planet generate; git commit -a; git branch postsdate +'%m%d%Y' ; git checkout postsdate +'%m%d%Y' ; git push origin posts_date +'%m%d%Y'

Just so we can roll back if things get mangled, or something. [It's happened many times with the existing old planet-venus based planetr site -- usually when a disk gets full, or something.]

Bonus activity here would be to use the freebie infrastructure at https://travis-ci.org/ to do this work, and push it all back up to github.io....

elijah commented 9 years ago

This is one of the threads that I followed when digging into this stuff in December:

http://blog.nilenso.com/blog/2013/09/16/octopress-planet-dot-rb-and-the-nilenso-blog/

elijah commented 9 years ago

Want to encourage people to eyeball http://offog.org/code/rawdog/ - the rawdog homepage - and look at the list of plugins at the bottom. Those are the sorts of features that we want to have available -- it's not-unusual for people to use something like Vellum (http://www.kryogenix.org/code/vellum/) in such an installation to rewrite bits and pieces of posts. As desired. ;-)

eddelbuettel commented 9 years ago

Very nice stuff, @elijah. I also lean towards using GitHub "because its there" and nobody needs to foot any bills. Spreading the effort wide is something I also think is desirable and can work well -- my prime example how @jereonooms made us all edit the useR! 2014 page that way.

Also not religious re rawdog but just like you impressed by the wide range of plugins. I speak next to no Ruby but can do some (modest) Python hacking. We'll chat some more...

lmullen commented 9 years ago

One possibility for blog aggregation is PressForward (GitHub), a WordPress plugin that aggregates content and creates a workflow for republishing posts. PressForward would lead to a more curated than automated approach, however.

eddelbuettel commented 9 years ago

Hm. I am not sure we would want to live within a WordPress environment. Otherwise the review and curation idea is pretty close to one possible approach I entertained. But as @karthik outlines above, fully automatic robot mode is nice too ... as we all are bloody busy already.

elijah commented 9 years ago

This is completely apropos and I just can't help myself:

https://twitter.com/sadserver/status/570260809399410688

"Statistically speaking it's more likely for you to be mauled by a bear than for you to properly secure WordPress."

;-)

--e

On Mon, Feb 23, 2015 at 11:16 AM, Dirk Eddelbuettel < notifications@github.com> wrote:

Hm. I am not sure we would want to live within a WordPress environment. Otherwise the review and curation idea is pretty close to one possible approach I entertained. But as @karthik https://github.com/karthik outlines above, fully automatic robot mode is nice too ... as we all are bloody busy already.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/unconf/issues/9#issuecomment-75587071.

eddelbuettel commented 9 years ago

That is priceless :) Any chance you;ll swing by SF for the unconf? Or do we have to G+ hangout/skype you in if we get going?

jeroen commented 9 years ago

I'd prefer something light like hacker news or reddit to aggregate blogs, but also potentially interesting articles, gists, SO questions, etc....

I think the key ingredient is a good algorithm to make good/fresh stuff float to the top... preferably with personalized weighting to counter the bias towards the always popular newbie/commercial crap.

tracykteal commented 9 years ago

Just want to +1 this idea. It would be great to have one site to point to for R-related blog posts. Also, whatever is implemented here could be a useful strategy for other communities. I like @karthik's fully automated robot strategy as 'no curation' is the lowest weight implementation. Maybe something like Planet Python http://planetpython.org @elijah's 'want your blog added, submit a PR' might be an intermediate strategy. If it seemed like low quality was an issue, a curation or up vote system could be implemented later. For up vote, etc, I've liked the Advogato trust metric, but I know it has its issues.

eddelbuettel commented 9 years ago

Yup. I think everybody likes the basic idea of Planet $WHATEVER and @elijah already used a Planet implementation, but possibly an older one as the Python one mentioned by @tracykteal.

hafen commented 9 years ago

I'm a fan of this idea and would be interested in joining any discussions about it at the unconf.

eddelbuettel commented 9 years ago

I still want this done and am hoping to work a little on a minimal solution, possibly GitHub based.

@elijah had done some more work poking around and probing some more (with email to). Anybody still got appetite for this?

jeroen commented 9 years ago

Something is brewing in the back of my mind... it's next up after mongo and gpg projects...

gaborcsardi commented 9 years ago

Any desire to integrate this with www.r-pkg.org at some level? :) E.g. news.r-pkg.org? Just asking, no problem if not. :)

I am willing to help, even if we don't integrate. :)

eddelbuettel commented 9 years ago

:+1:

I would so luuuv to have both of you on this as there is obviously so much js goodness around this.

As for r-pkg.org: "too visible" :) This is skunkworks for now, but when we have something we like we can surely make it more visible.

tracykteal commented 9 years ago

:+1:

timelyportfolio commented 9 years ago

Am also very interested in this and happy to help in any way.

gaborcsardi commented 9 years ago

@eddelbuettel r-pkg.org has very little traffic actually. :)

But that's fine, I can certainly put it somewhere else as well. I'll listen to what Jeroen has to say.

If you a have framework in mind, I can put that on Dokku/DigitalOcean. Dokku is great, containerized microservices with a git-based workflow. Do-it-yourself Heroku essentially.

karthik commented 9 years ago

Any desire to integrate this with www.r-pkg.org at some level? :)

That would be great and I'm very supportive of this.

elijah commented 9 years ago

The old planetr.stderr.org was a planetplanet implementation - from about 2007, if memory serves. It wasn't very great but it has worked for a LONG time without any real maintenance work or time spent on it by me.

It'd be super cool to have things like customizable weightings and such, but that's a lot more work, and I've never actually seen an open implementation that was simple enough (e.g., 'boneheaded') enough to actually continue working over time.

Minimal dependencies are REALLY important for this sort of thing, btw. If it needs a RDBMS or something, it will bitrot and people will be sad.

--e

On Wed, Mar 4, 2015 at 9:36 AM, Dirk Eddelbuettel notifications@github.com wrote:

Yup. I think everybody likes the basic idea of Planet $WHATEVER and @elijah https://github.com/elijah already used a Planet implementation, but possibly an older one as the Python one mentioned by @tracykteal https://github.com/tracykteal.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/unconf/issues/9#issuecomment-77180072.

elijah commented 9 years ago

I missed the unconfy thing - not being very well plugged into R these last few years - but am happy to talk to folks / play with things anyone wants to grind on.

--e

On Wed, Feb 25, 2015 at 7:23 PM, Dirk Eddelbuettel <notifications@github.com

wrote:

That is priceless :) Any chance you;ll swing by SF for the unconf? Or do we have to G+ hangout/skype you in if we get going?

— Reply to this email directly or view it on GitHub https://github.com/ropensci/unconf/issues/9#issuecomment-76105042.

elijah commented 9 years ago

I'm having a chance to hack on the idea I had of using travis-ci to noodle with the articles -- I have the freebie-implementation at travis-ci.org pulling the repo down and running jekylll tests on all of the bits that are in the _posts dir; I'm working now on having the build/test process actually do the pulling itself, so it can promote if all the content renders properly (and such) -- I got spurred into working on this when I ran the manual process to update posts (planet generate ...) and broke the build on github.io.

gaborcsardi commented 9 years ago

@jeroenooms Can't we just do this on reddit? As a subreddit, where we automatically submit aggregated content to? I am not a big reddit user, so I have no idea.....

timelyportfolio commented 9 years ago

@abresler know reddit well . Sounds like I need to know it.

eddelbuettel commented 9 years ago

:-1:

The whole point is to not depend on someone/something else.

Remember: minimal implementation, self-contained, possibly on GitHub (or a cheap hosted site). See discussion above, notably posts by @elijah

gaborcsardi commented 9 years ago

In that case I suggest a custom server, not Travis CI. I am not a big fan of Jekyll, either. :) A static site would not work anyway, if we want dynamic ranking.

eddelbuettel commented 9 years ago

I'd call dynamic ranking a nice to have, not a must have.

I'd be happy with static compilation: given a list of input (sites), render output.

(And I don't want to monopolize discussion/direction here: if folks feel that +1/-1 votes are essential and reddit would do then by all means start something on reddit. We cannot really do worse than the current R blog aggregation over at the main R bloggers site...)

gaborcsardi commented 9 years ago

Well, it was just a question, and I don't think a subreddit is the best, either. :)

Actually, even for a static site, Travis is very limiting, and running it on a custom server in the cloud is so cheap ($5 to start with), and gives you so much more freedom, that I would start with that. Some ideas for the infrastructure. These are just the things I know relatively well, and am comfortable with. I am also willing to implement the infrastructure, and then Jeroen can do the really important things, like the ranking, and the design of the site.

Feedback is more than welcome.

I am happy to implement all these. If we agree on the easy parts, I can put them in place, and then we can work on the real stuff:

Most important question: what should be the domain name? :)

karthik commented 9 years ago

I agree that Reddit is a terrible idea :-1:

Most important question: what should be the domain name? :)

I had the idea of rblogs.org How about that?

I really like the design. I was thinking something much simpler than what you have set up. That people could send a PR or file an issue with their website request, then they include a small file on their server consenting to inclusion. They can delete the file anytime and we can automatically remove.

The design should also be fairly simple and minimalist (no banner ads or hard to read on mobile). Full posts where possible.

eddelbuettel commented 9 years ago

Nice that we now got rid of the reddit non-starter :)

I like a lot of this, but find other things already a little heavy:

I think @elijah hit the nail on the head earlier:

Minimal dependencies are REALLY important for this sort of thing, btw. If it needs a RDBMS or something, it will bitrot and people will be sad.

The last thing any of us need is another (code/project) baby to sit and watch. Let's write some code, and set up a nice process that is as automatic as we can. Keep all the glossy voting and personalisation for the next iteration. Get something simple enough out fast would be my mantra.

(That said, talk is cheap for me as I can't code in js ...)

gaborcsardi commented 9 years ago

I had the idea of rblogs.org How about that?

:+1: I like that very much. At last a domain name without a dash!

I really like the design. I was thinking something much simpler than what you have set up. That people could send a PR or file an issue with their website request, then they include a small file on their server consenting to inclusion. They can delete the file anytime and we can automatically remove.

:+1:

The design should also be fairly simple and minimalist (no banner ads or hard to read on mobile).

:+1:

Full posts where possible.

You mean on the front page? I think on the front page we could have just the blog title, title, date, number of views and +1s, maybe a very short excerpt, but probably not even that. Then have the full articles on a separate page. I expect a lot of articles, so maybe we don't want full articles on the front page. Especially not on mobile.

eddelbuettel commented 9 years ago

But we have r-pkgs.org, r-project.org, r-consortium.org. And r-blogs.org is available too ...

gaborcsardi commented 9 years ago

why does it have to be on digital ocean (or anywhere else)

It does not have to be there. But it has to be somewhere, right? :) DO is the cheapest I know.

do we need "processing" of input?

I guess so. We can try without processing, but soon we'll need to refine the HTML, the CSS, etc.

do we need "storage" (ie mongodb proposal)?

If we just link, we don't need it. If we have a db, and serve full articles locally, then we can make sure they look kinda the same, and they are served on a good quality server. But I don't mind starting with linking only.

The last thing any of us need is another (code/project) baby to sit and watch. Let's write some code, and set up a nice process that is as automatic as we can. Keep all the glossy voting and personalisation for the next iteration. Get something simple enough out fast would be my mantra.

Agreed completely. That's what I meant by leaving out the hard parts first.

Mongodb is extremely simple. Ten minutes literally. Redis, for caching, too. We'll most probably need them later, anyway, but we can start without them. But as soon as we have e.g. authentication, we'll need to store the sessions somewhere, and mongo is excellent for that, too.

What I wrote, minus the "hard" things, will take me literally an afternoon to implement. Then we have a nice infrastructure in place, and can do the hard stuff.

If we agree on it today, I can probably do it the evening.

gaborcsardi commented 9 years ago

I don't mind having r-blogs.org, either, but hate typing the dash, especially on the phone..... we can also have both, and rblogs.org can just point to r-blogs.org, or is that confusing?

gaborcsardi commented 9 years ago

But if @jeroenooms already has sg in mind, and that is not compatible with the things I wrote, I am happy to wait for him, of course.

eddelbuettel commented 9 years ago

Here is where I am coming from:

I see this as viable, minimal, no cost (besides maybe one domain reg.) Suits me.

But all that said, "code trumps talk". If you have something better and implement it you are obviously ahead on several counts :)

gaborcsardi commented 9 years ago

@eddelbuettel That's completely fine with me, too. :)