ros-infrastructure / answers.ros.org

Tickets for answers.ros.org
4 stars 1 forks source link

Profile spam #125

Closed trainman419 closed 8 years ago

trainman419 commented 9 years ago

I was looking through new user profiles today, and it looks like we're seeing a new type of spam: users who create a profile with links, but never post any content.

Example: http://answers.ros.org/users/23662/magazineprinting/

This isn't the sort of thing that directly affects the appearance or functionality of the site, but it may negatively affect how search engines rank results from our site.

Can we hide a user's profile and website if they have not yet been approved by a moderator?

tfoote commented 9 years ago

Yeah, the website link is marked at nofollow until 25 karma is achieved. The profile should definitely at least be sanitized.

We don't use the profiles. I think we could consider simply turning off the profiles.

tfoote commented 9 years ago

I blocked a bunch users:

users/23666/saudi-airlines/ users/23656/road-to-euro/ users/23643/best-investment-in-australia/ users/23639/part-time-cleaning-services-in-dubai/ users/23640/silk-charmeuse-fabric/?sort=moderation users/23636/ocr-conversion users/23613/digitalprintingdubai/ users/23603/picgrantsingapore users/23604/trusses-in-dubai/ users/23605/ashton-patrick/ users/23621/foot-reflexology-bangkok/ users/23618/male-enhancement-pill/ users/23633/semi-d-in-damansara-heights/ users/23599/architectural-shingles/ users/23623/bitumen-modification/ users/23612/bestprohormones/ users/23600/carrentalinkl/ users/23592/fire-rated-glass-singapore/ users/23593/businessgift/ users/23591/rumsingapore/ users/23617/marinacourtkk/ users/23562/botanika-singapore/ users/23569/wedding-photography/ users/23559/desertcampluxury/ users/23568/penangroller/ users/23573/sanluisobispobbq/ users/23572/oilprices/ users/23578/spas-around-dubai-marina/ users/23576/citizen-calculators-hong-kong/ users/23575/bostondirectmail/ users/23547/hirewagonringoa/ users/23549/bulb-flat-singapore/ users/23579/buysyringes/ users/23571/womenportraitsdubai/ users/23550/beadsandjewellerysupplies/ users/23543/water-pump-australia/ users/23542/external-insulation-cost/ users/23541/lebanon-catering-companies/ users/23540/rootcanalparramatta/ users/23531/xian-jiaotong-university-china/ users/23514/bestfacialexfoliator/ users/23515/photo-canvas-malaysia/ users/23519/mattresssingapore/ users/23530/deli-display-case-nsw/ users/23532/bestmedicaltreatment/ users/23534/supplieroffineblanked/ users/23501/caterpillarpartsonline/ users/23502/dhol-player-hire-london/ users/23503/outdoor-photography-in-dubai/ users/23496/how-much-is-my-home-worth/ users/23495/bivouac-chigaga/ users/23494/freelance-web-designer/ users/23493/seo-in-dubai/ users/23491/conference-production-qatar/ users/23482/hotel-in-sabah-kota-kinabalu/ users/23479/bim-consultancy/ users/23478/freelance-makeup-artist-singapore/ users/23475/wonder-glow-skincare/ users/23466/kota-bharu-airport-car-rental users/23464/singapore-call-girls/ users/23460/restaurant-laundry-services-sydney/ users/23459/pallet-storage-essex/ users/23458/car-accessories-malaysia/ users/23454/service-learning-ecuador/ users/23451/gundam-malaysia/ users/23449/family-dentist/ users/23445/corporate-events-management/ users/23444/wealth-accumulation-strategies/ users/23443/printed-or-embroidered-customised-promotional-clothing-t-shirts-polo-shirts-hoodies/ users/23442/pantone-singapore/ users/23440/perth-removalists/ users/23436/commercial-interior-design/ users/23435/event-decorations-classes/ users/23434/construction-recruitment-agencies/ users/23433/truck-wreckers-brisbane/ users/23672/best-travel-partner/

This got me to August 2nd ~ page 6 scanning new profiles for the spam.

Looking farther down it seems to go back into about April 23rd users/22249/punchcrafttoolsdubai/

tfoote commented 9 years ago

I left the linked profile up for reference. The profile content is as below. @evgenyfadeev is it expected to allow arbitrary html?

 Looking for school magazine in Singapore? Pixergram is a self-publishing platform focused exclusively on making bookstore-quality print book. Call us at +65 65383590.

<b>Website:</b> <a href="http://www.pixergram.com/magazine-printing/">Magazine Printing</a>
tfoote commented 9 years ago

I cleaned up several new posts:

users/23707/rent-holiday-homes-johor/ users/23705/thepanorama/ /users/23684/shopping-online-malaysia/ users/23685/phuketblinds/ users/23687/kitchen-cabinet-tawau/ users/23698/dubaiferrarirental/

More this morning users/23709/exhibition-stand-design/ They appear to be trying to hide better with using more common usernames. users/23701/robertc92106/ users/23677/davidneelyus/ users/23699/austinevan/

From my unscientific sampling I've seen a lot of yahoo emails.

tfoote commented 9 years ago

Ok, I've just cleaned up everything in between those dates. I estimate I've blocked and deleted ~200 spam profiles. The number of pages of users dropped from 385 to 377

tfoote commented 9 years ago

They can still be found here by name: http://answers.ros.org/badges/16/autobiographer/

tfoote commented 9 years ago

I've cleaned up several new posts, they seem to have slowed down. We've found a bunch of them on answers.gazebosim.org too.

evgenyfadeev commented 9 years ago

Prepared a fix to hide profile info for users with karma < what is needed for auto-approval (so far it was hidden only for blocked accounts), will deploy tonight.

evgenyfadeev commented 9 years ago

Now profile spam is only visible to logged in site admins and moderators, not visible to search engines.

tfoote commented 9 years ago

Great thanks. I've confirmed it only renders for admins. Checking how it looks to the google webmaster tools the profiles are also excluded in the robots.txt so they're really not getting any value from the links.

gavanderhoorn commented 9 years ago

I've also been blocking many 'spammers' recently. Really elaborate posts, with multiple paragraphs of (seemingly) normal text, links to product-spam pages in there as well.

Profiles are all really elaborate: normal age, normal-ish name, etc. Also links to product page in profile.

Example: jacklynhoo.

tfoote commented 9 years ago

Yeah, I think we should keep cleaning them up. Hopefully they'll realize that they're not getting any value out of it now. They do seem to escalating complexity with that example even asking a question. I'm guessing that it must be people actually doing it as they're adapting to our moderation.

tfoote commented 8 years ago

I just deleted ~ 180 profiles created since I left for ROSCon around the beginning of the month. I'm reopening this as we will need to find another solution for this.

From a spot check, they are all yahoo addresses. @evgenyfadeev can you confirm if they're using Yahoo for registering? I'm kind of thinking that we might just blacklist that service for registration.

tfoote commented 8 years ago

Checking this morning I found 10 more since I cleaned up 12 hours ago.

There was even one which had a new MO. I've been clearing anyone with obvious spam, but skipping empty profiles since it could be a tenative new user. I ran into one where they waited until after I purged and then added the spam links.
answer_new_mo

This means that sweeping new users may no longer work as a clearing method.

evgenyfadeev commented 8 years ago

I will update the site soon with support of "nocaptcha" recaptcha - it seems to work better.

On Thu, Oct 15, 2015 at 7:04 PM, Tully Foote notifications@github.com wrote:

Checking this morning I found 10 more since I cleaned up 12 hours ago.

There was even one which had a new MO. I've been clearing anyone with obvious spam, but skipping empty profiles since it could be a tenative new user. I ran into one where they waited until after I purged and then added the spam links.

[image: answer_new_mo] https://cloud.githubusercontent.com/assets/447804/10519697/a32421d8-731b-11e5-84c0-b88eb952c970.png

This means that sweeping new users may no longer work as a clearing method.

— Reply to this email directly or view it on GitHub https://github.com/ros-infrastructure/answers.ros.org/issues/125#issuecomment-148436749 .

Askbot Valparaiso, Chile skype: evgeny-fadeev

tfoote commented 8 years ago

I've tried disabling the yahoo login. If it works, I'd like to explore disabling it for registration but allowing adding it later as a login method.

tfoote commented 8 years ago

We got another spammy profile with Yahoo blocked: users/24536/lesmickle maybe the new captcha will be enough.

130s commented 8 years ago

Maybe very corner case, but since Gmail social login stopped working on answers.gazebo.com for awhile but Yahoo social still works there, there may be decent users who use Yahoo on answers.ros.org.

tfoote commented 8 years ago

@130s that's why I would like to consider allowing login but not registration see comment above. Right now that granularity is not possible and if they're letting people create spammy accounts that's not holding up their end of the opennid bargain. This is a test to see if it stops the flow of spam.

130s commented 8 years ago

@tfoote I see, that makes sense.

tfoote commented 8 years ago

We still got 5-6 spam accounts with yahoo turned off. Some were still yahoo accounts. I've turned on Yahoo again.

There seems to be potentially real people behind these things. I'm not sure that a Capcha is going to be enough. Though we might try.

There's a bunch of services designed to fight the spam aggregating reports across multiple sites. Searching for some of the emails I found them listed by: https://cleantalk.org/ and http://www.stopforumspam.com/

tfoote commented 8 years ago

Another example of where spam has been a problem for open registration: http://blog.openhub.net/2015/09/why-do-we-ask-for-your-phone-number/ They have resorted to requiring SMS verification, and since that post have added github verification too.

tfoote commented 8 years ago

I just deleted another 24 and that was about a 50% spam rate

tfoote commented 8 years ago

I just cleared close to 200 from the last week. There are about 60 legitimate new accounts in the same time period.

@evgenyfadeev Can we find out which registration mechanism the blocked accounts are using?

gavanderhoorn commented 8 years ago

@tfoote: I assume you have some UI to look at new registrations? Or are you just iterating over user ids?

tfoote commented 8 years ago

You can get the list of users sorted by the most recent: http://answers.ros.org/users/?sort=newest

gavanderhoorn commented 8 years ago

O right, I knew that. Still, would be nice to have a "these accounts registered since X-Y-Z" overview.

evgenyfadeev commented 8 years ago

@tfoote those are regular password accounts. They solve captcha and validate email via API. Many spam accounts have @gmail.com and @yahoo.com addresses

The good news is that the new captcha stops these users, I will update your site today and will let you know.

On Mon, Oct 26, 2015 at 11:58 AM, G.A. vd. Hoorn notifications@github.com wrote:

O right, I knew that. Still, would be nice to have a "these accounts registered since X-Y-Z" overview.

— Reply to this email directly or view it on GitHub https://github.com/ros-infrastructure/answers.ros.org/issues/125#issuecomment-151066516 .

Askbot Valparaiso, Chile skype: evgeny-fadeev

tfoote commented 8 years ago

I can confirm the new captcha seems to be stopping the spam flood!

tfoote commented 8 years ago

A few are trickling in, but I don't know that we'll be able to get them all. ~1 in 9 days is great.

users/25016/solarpanelbangkok

tfoote commented 8 years ago

They're still coming in occationally. I just deleted one this morning. I don't think there's anything we need to do though. answers ros org_spam_profile

tfoote commented 8 years ago

We're still getting occasional spam accounts. I just deleted 7 auditing the last 450 users going back to June 27th. I don't think there's much we can do besides an occational audit.

Example: image