Closed AndrewHain closed 5 years ago
Clearly this is a ridiculous suggestion.
I think that's really a duplicate of https://github.com/gravitystorm/blogs.osm.org/issues/17
Well yes, but my point is that censoring diary entries based on what language they are written in is clearly not something we could countenance.
This isn't going to end soon, says alexkemp: https://www.openstreetmap.org/user/alexkemp/diary/338244
"The easiest method to defeat {put in automated spam tool} is to simply require the first post of any new forum member or blog poster to be approved before it can appear."
Yes, because I am really looking forward to having 5000 posts to approve when I get up every morning.
I mean obiovously that is an option, but one that requires significant engineering and is not a quick fix even if it is practical.
Right. Usually, there's only a very small number of non-spam blog posts, and only those would need some approval - and that's for the very first time someone posts a blog only. The others can be automatically purged after a few days, if noone cared to approve them, or user complained that their posts are still not showing up on the page.
Some really low hanging activities could be:
Is there are evidence they are actually getting indexes in the few hours before they are removed?
Obviously we can ban posting by new users, but that falls into the category of "collateral damage" that I have just discussed on Alex's latest rant.
Yes, they are showing up on Goog index fairly soon. I tried this yesterday with some random Chinese spam snippets.
There is also of course no reason to believe that they wouldn't just add a delay between creating the account and posting.
Frankly I think a more reasoned response would be that it is ridiculous for us to be running a blog system that is entirely unrelated to our primary purpose and just ditch the diaries altogether.
Ah yes, that's kind of the "nuclear option". I was also thinking about shutting down the blog system altogether.
Yes, because I am really looking forward to having 5000 posts to approve when I get up every morning.
On that specific point, why does it have to be just you that does it? Lots of people have been banging on about diary spam and surely some of those would be appropriate to have as "diary approvers" with the specific job to approve valid posts (and only that). Sure, the system to allow that won't write itself, but there's no reason that any extra effort once set up needs to sit explicitly on the admins.
I think alexkemp was right about the nature of those post - they're currently training their spam bots by posting some random news articles. Let's see how it goes.
Ah, there's an issue with the robots.txt change: you need to remove the trailing slash in
Disallow: /user/*/diary/
i.e. replace this row by
Disallow: /user/*/diary
Otherwise, the user's blog list (e.g. https://www.openstreetmap.org/user/TomH/diary ) still gets indexed.
Test tool I used: https://webmaster.yandex.com/tools/robotstxt/?hostName=https%3A%2F%2Fwww.openstreetmap.org%2Frobots.txt
For the moment, if we cannot contain the spam, I would consider removing the user diaries from the feed until we find a way to fix this.
However, looking at the topmost spam post, the related user seems to have been deleted already. So solving #17 (ensuring that diary entries disappear from the blog when the user is deleted) might also solve this issue.
https://github.com/gravitystorm/blogs.osm.org/issues/40 is related to this issue.
I agree that simple block rules based on the used characters sets are too simple but just ignoring this problem is not an option either.
We're not ignoring anything - we are making ongoing efforts to fight the spam and to add new features to help control it.
https://blogs.openstreetmap.org is currently useless and WeeklyOSM/Wochennotiz has disappeared within hours.