gravitystorm / blogs.osm.org

The new feed aggregator for OpenStreetMap
https://blogs.openstreetmap.org/
6 stars 17 forks source link

Remove user diaries from blogs.openstreetmap.org #40

Closed Nakaner closed 5 years ago

Nakaner commented 5 years ago

As described in https://github.com/gravitystorm/blogs.osm.org/issues/17, we currently lack the ability to delete blog entries (usually spam) which have been fetched from a source. While we have been able to live with it for years, the recent spam flood on the OSM user diaries limits the benefit of the feed provided by blogs.openstreetmap.org to a minimum. That's why I would like to propose to remove the user diaries from blogs.openstreetmap.org as a temporary mitigation of the spam problem. People interested in the user diaries can still fetch them as a feed from https://www.openstreetmap.org/diary/rss but the pure user diaries' feed does not contain deleted spam entries when you fetch it after the entry has been deleted.

I myself have been using blogs.openstreetmap.org as a valueable source for WeeklyOSM/Wochennotiz since I joined WeeklyOSM/Wochennotiz 4½ years ago. It would be a pity if I and maybe the whole team would have to collect the dozens of feeds ourselves. :-(

tomhughes commented 5 years ago

That would be a totally ridiculous way to solve the problem.

gravitystorm commented 5 years ago

Hi @Nakaner - I'm sorry that you received such a terse response, but I know that @tomhughes is busy every day trying to keep on top of the spam.

I think if it was any other feed that was having these problems, we would have removed it straight away. If my own blog was sending 300+ spam posts to this aggregator, I would remove it myself immediately. Unfortunately even though @tomhughes removes the spam from osm.org multiple times per day, it's already been copied from the osm.org feed, and so ends up in this aggregator. Even if we remove them from this aggregator (e.g. by regenerating the db from scratch each run, see https://github.com/gravitystorm/blogs.osm.org/issues/17#issuecomment-277580473), they will still have made it to e.g. my feed reader.

So as long as we gets spam on osm.org - even briefly - it will lower the value of this aggregator, no matter what cleanups we do and how quickly they are done. When it's only 2-3 spams per day that's not so bad, but when it's 300+ per day then I just "mark all as read" on my reader and I miss everything.

gravitystorm commented 5 years ago

So while we work in parallel on trying to stop spam getting onto osm.org in the first place, I'd like to gauge opinions as to whether disabling the osm.org feed here TEMPORARILY is better or worse than the current situation. I mean, personal opinions based on how you use this system. Not general hand-wavy things or what you think other people would think.

For me, I would support it, because at the moment I just "mark all as read" each morning on my feedreader. If the osm.org feed was removed, I would at least see a trickle of openstreetmap-related posts from elsewhere.

tomhughes commented 5 years ago

I'd rather add a delay to the diary feed or change the aggregator to do a full rebuild every time - dropping the feed is just silly.

harry-wood commented 5 years ago

The spam retention issue was fixed: https://github.com/gravitystorm/blogs.osm.org/issues/17

No need for this extreme solution now. We can close this