Immediately suspend Awario scoring

dantrevino commented 5 years ago

What is the problem you are seeing? Please describe. Since we cant appear to have nice things without some people deciding to drag them through the mud, I suggest we suspend Awario scores immediately, effective June 1. Not July, June 1. Social awareness scores are leading teams to purchase slimy third party social marketing. See additional context below and ask yourself if this is the kind of platform and ecosystem you want to build.

How is this problem misaligned with goals of app mining? In my mind, this is self evident. These social media influencers contribute 0 to an application's quality.

What is the explicit recommendation you’re looking to propose? As the title suggests. Immediate end to Awario scores being included in any app mining results.

Describe your long term considerations in proposing this change. Please include the ways you can predict this recommendation could go wrong and possible ways mitigate. Clean up the unfortunate marketing tactics and force teams to market on their product strengths and features, rather than the # of bots they can pay through a 3rd party.

Additional context https://twitter.com/tariqnoorkhan/

dantrevino commented 5 years ago

related #117

cuevasm commented 5 years ago

So again I think we should just establish a specific list of things that the community says no to and then do our best to catch and prevent those things. I can easily blacklist accounts like this, we have the tools to make this better - I don't think we ever get better at it by just shutting it off.

Edit: I blacklisted the account you linked, that's an obvious one to me - others should let me know if they feel differently.

polluterofminds commented 5 years ago

Counterpoint: you all shut off Democracy Earth.

cuevasm commented 5 years ago

Yeah, that was after a lot of efforts to make it work for everyone though and finding we couldn't. I don't think this App Reviewer has got the same level of feedback or effort on that front so I'm hesitant to launch right into canceling it without at least seeing if we can improve it.

pstan26 commented 5 years ago

To be fair Democracy Earth was run for multiple months with a lot of time spent trying to improve its efficacy, to no avail. Awario may have the same fate, but ain’t it worth trying to improve, given there are ways to, before writing it off completely?

dantrevino commented 5 years ago

@cuevasm i think thats a great start, but in the mean time, I feel like people are looking to game the system.

When we can come up with a system that is a little less broadly exploitable, I would support turning it back on personally. I think the base idea is valid, but are we going to continue to chase the latest slimey loophole every month?

polluterofminds commented 5 years ago

I agree with Dan here.

I paid for a promoted tweet (through Twotter’s official channel) to try to compete with these influencers. I’d rather compete against promoted tweets and regular tweets rather than compete against bots. But if Blockstack/community ends up saying this is ok I will 100% be spending a lot of money on influencers.

cuevasm commented 5 years ago

Curious how folks are delineating between regular tweets and spam or others without a manual review. I think influencers are philosophically fine, I'm on the same page with you guys about not paying zombie accounts or RT chains to fluff the numbers. I am trying to formulate a realistic approach to solving this when I don't even think Twitter has solved it. I'm going to reach out to the Awario team and see if they can offer some insights here too.

kkomaz commented 5 years ago

When we can come up with a system that is a little less broadly exploitable, I would support turning it back on personally. I think the base idea is valid, but are we going to continue to chase the latest slimey loophole every month?

This is not scalable if we have to do this every month. Especially because app developers are not constantly monitoring other paid promoted bot tweets. In fact, this is even worse because we get the results at the END of each app mining period. Thus, the blacklisting happens AFTER the paid promoted tweet occurs.

cuevasm commented 5 years ago

@kkomaz by next month you'll have access the whole time so we could offer basically the 1st - 14th or so as a window for everyone to look through and mark bad Mentions for review/removal. It's definitely a reactive model that I don't think is ever perfect, but maybe with some rules around repeat offenders, it could be workable.

polluterofminds commented 5 years ago

I’m actually super confused by the blacklisting of this twitter account. Is that based on a decision by Blockstack that paid influencers is against the rules? If so this should be applied in a blanket manner.

dantrevino commented 5 years ago

@cuevasm that's the problem. It's kind of like art: I can't describe it but I know it when I see it. We're not going to come up with a good consistent algorithm to define what is good and what is bad in these cases.That's why I think leaving the scoring open means we will always be chasing the 'bad' scores. That's not scalable.

cuevasm commented 5 years ago

I’m actually super confused by the blacklisting of this twitter account. Is that based on a decision by Blockstack that paid influencers is against the rules? If so this should be applied in a blanket manner.

It's clearly a shell account that puts it through a RT network of bots (I saw it programmatically happen when the first account tweeted), it's not an influencer wherein you arrange a brand engagement of some kind and pay them for their post. Do you propose some kind of voting system where everyone should decide on what is on which side of the line? Perhaps it's something we could have Awario just decide, I don't know.

No form of paid advertising has been banned. This blacklisting also had nothing to do with it being paid or not, just that it was an obvious bot network.

polluterofminds commented 5 years ago

I guess my point is, how do you know if that account was paid to perform these tweets? Your definition of influencer marketing seems to rely on your own definition.

This is the problem. I think the problem is clear. I think it’s insurmountable unless Awario has a solution. The community will not be able to decide this.

cuevasm commented 5 years ago

@cuevasm that's the problem. It's kind of like art: I can't describe it but I know it when I see it. We're not going to come up with a good consistent algorithm to define what is good and what is bad in these cases.That's why I think leaving the scoring open means we will always be chasing the 'bad' scores. That's not scalable.

I don't disagree with you there. Do we have any other ideas for how to encourage good behavior or discourage bad? I personally feel like this has a lot of value if people aren't intentionally gaming it. In a perfect world, folks would just market their product and let this fall where it does, gleaning insights and putting the data to use to improve. Instead, it's in danger of becoming a race just to pump numbers instead of letting good numbers follow good marketing.

cuevasm commented 5 years ago

I guess my point is, how do you know if that account was paid to perform these tweets? Your definition of influencer marketing seems to rely on your own definition.

Yeah again in this specific case, nothing to do with paid or not. There are times when it's very easy to spot a bot network, this was one of them. I guess I could have jumped the gun and there are folks that want to allow bot networks to count, but I didn't think so.

cuevasm commented 5 years ago

I'm gonna drop from this for now, but it's definitely top of mind and I've reached out the Awario team to see if they have any ideas. Hopefully goes without saying I appreciate you two a ton!

Something I was thinking about last week: It's interesting the problem the need for transparency creates. Google doesn't publish their algorithm because they don't want people doing this. If the way your score was calculated was more hidden and maybe even multiple reviewers poured into one score before it went into your ranking score, it could force people to stay more on the rails of doing good marketing and letting things land. Instead, it's fully exposed so all the dirty little things you would never do for your real business maybe become obvious and attractive in the context of App Mining.

polluterofminds commented 5 years ago

Thanks, Mitchell! Appreciate your attention too. And you’re right. But the fact is, everything we do in the DWeb space is harder and takes more thought than the traditional web.

kkomaz commented 5 years ago

For the sake of this issue not being lost after today, what are the actionable goals and next steps? Can we also add some follow up date to reestablish more well defined criteria for the future scoring including but not limited to penalizing app for using twitter bots/etc.

cuevasm commented 5 years ago

The next steps in my mind:

[Done] I have reached out the Awario team for input here, we have over 3 weeks until it's time to score Awario again and determine if we can find a way to include the scores in a way that makes everyone happy. No action needs to be taken now to 'suspend' scoring if that's what we decide to do later, we will just keep collecting data and decide later if we are going to use it.
If Awario has no solutions for us, we'll need to have some kind of vote on including it for the next run or not
From there decide if it can be improved and if so, the steps we could take to get there.

One solution is to remove Twitter, and/or Facebook, Youtube, and Instagram, and Reddit, but then you're only left with news/blogs - the algorithm for which is protected + to me takes out consideration over the biggest place (social) everyone has to be growing their user base.

cuevasm commented 5 years ago

And of course, if anyone is able to offer anymore specificity to the list of things we don't want to allow, I think that's a useful list to have for Awario and for ourselves moving forward in any direction.

dantrevino commented 5 years ago

I personally feel like this has a lot of value if people aren't intentionally gaming it.

1000% agree. That being said, any solution on Awario, short of suspending them, should take into account how that solution aligns with the goals of the Blockstack platform, not just app-mining ... We should ensure that we're not creating a system that leads teams to value meaningless numbers, instead of real contributions to the growth of the platform.

Deutsch4534 commented 5 years ago

Which is the best and peaceful app in the world? This is a fake question just like someone ask themselves , who are the best doctors or who are the best developers?

Who can identify which app is the best in the game? Answer: no one , or may be muneeb Ali

What are the methods can incent developers to write more and more decentralized( blockstack ) apps?

Love and peaceful heart
Open Source or not? ( codes can’t lie, if codes open sourced, then apps hardly to be Satan, can’t be evil is a naive dream, except there is no Satan in the world)
Cancel producthunt and twitter elements, add github element inside. Like code frequency, Commits, code review, etc.

If medicines and prescription are open sourced, more diseases would be cured. If codes are open sourced , more problems in the world would be fixed. We don’t hate developers who choose closed source, but they love money than peace.

@dantrevino @jehunter5811 @cuevasm @muneeb-ali

friedger commented 5 years ago

Related to https://github.com/blockstack/app-mining/issues/98 : Define guidelines what is good behavior. Define a way how to enforce the guidelines.

Awario is good to give some incentives to do social marketing. Agree to at least try to improve awario.

polluterofminds commented 5 years ago

I don’t use any other social apps besides Twitter, but a simple search for each app in app mining reveals what I believe are clearly the apps paying bot farms to tweet and retweet stuff for reach.

So, if we can define guidelines, it’s not that hard to spot violations of said guidelines.

I’ll start with one guideline I’d like to have enacted:

Tweets of the exact same content from multiple people (not retweets) should be flagged and reviewed for removal from Awario score.

polluterofminds commented 5 years ago

Here's an example. This bot/person didn't even remove the tweet count before tweeting the same exact message as the guy Mitchell blacklisted. Screen Shot 2019-06-26 at 7 39 59 AM

ViniciusBP commented 5 years ago

Vinicius from BitPatron here, yes this was a paid tweet, but on the first month we were first on Awario, 100% organic. After analyzing last month results, we saw that this kind of promotion made some apps do very well and that someone complained about this on issue #117. By reading the comments, the conclusion of the discussion seemed to be that it was Ok. We agree with most comments here, and we would suggest having something like credible reach. For example, for Twitter, Awario should use the results from Twitter Audit ( https://www.twitteraudit.com/ ). Not sure if it is possible to cover all cases/channels without manual check, but maybe the great majority, leaving less to be manually reviewed.

kekibejog commented 5 years ago

@cuevasm Investigating tweets seems hard, but detecting the top ones is very easy. Just removing these, will almost bring back the normal.

And this: https://twitter.com/adnagam Didn't include this one, as it seems valid enough: https://twitter.com/ShaileshTr

friedger commented 5 years ago

What do you mean by normal?

The guideline for this could that accounts should not retweet the same tweet at the same time.

On Fri, 28 Jun 2019, 18:27 kekibejog, notifications@github.com wrote:

@cuevasm https://github.com/cuevasm Investigating tweets seems hard, but detecting the top ones is very easy. Just removing these, will almost bring back the normal.

Some examples: https://twitter.com/tariqnoorkhan https://twitter.com/TerriBauman https://twitter.com/PRINCE2PROJECT https://twitter.com/EntreprePro https://twitter.com/BlancaMoor https://twitter.com/jessicamackk https://twitter.com/christyanthony https://twitter.com/AmyChismuk https://twitter.com/eviejohnson88 https://twitter.com/laraprincesss https://twitter.com/elizabethusa https://twitter.com/sandrapowell1 They seems a network of fake bot retweeting at a same time, and probably owned by the same man (the first account). All of the others retweet his account (Check tweets of Jun 26 for more info).

And this: https://twitter.com/adnagam Didn't include this one, as it seems valid enough: https://twitter.com/ShaileshTr

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blockstack/app-mining/issues/120?email_source=notifications&email_token=AALBYWIUU3SSUD7SDFNOU73P4Y3YPA5CNFSM4H3M6WDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY2RMFI#issuecomment-506795541, or mute the thread https://github.com/notifications/unsubscribe-auth/AALBYWJ3ZNATLLBHU6HC3KTP4Y3YPANCNFSM4H3M6WDA .

cuevasm commented 5 years ago

I agree @kekibejog - there isn't THAT much of this happening at the moment. I'm fairly confident I can sniff them out, it's a matter of determining what everyone thinks the rules are.

In any case, here is my proposal:

This month, I go through the Awario data and mark all the suspicious tweets - I present these to the community and the App Miner they relate to for consideration, noting that unless the community is convinced otherwise, they won't be counted. I will be marking obvious bots, RT farms, and other Twitter accounts that 'seem fishy' for everyone to review. I know this isn't a perfect definition, but maybe it can provide a starting place by which to solidify the rules. We've been talking in generalities and I think we can make some progress by getting more specific.

From there, I'll provide the final list of counted content and you all can determine if this is within your threshold for acceptability or not. If not, we won't use Awario in the scoring this month or again until we can figure out something else.

At the end of the day, all App Reviewers have a margin for error, I think Awario provides some of the best data and is more objective than others, so I want to try to see if we can keep it. The idea here is to get within an acceptable range, knowing that the scoring itself is designed pretty damn well to prevent long-term gaming being all that successful.

dantrevino commented 5 years ago

This seems like a reasonable step @cuevasm. Its not at all scalable though.

cuevasm commented 5 years ago

Thinking about an Awario model wherein we don't measure Reach, but we just give out zeroes and ones for each network that's tracked. It's like, there's activity or there's not on Reddit, Twitter, blogs/news. No one would be encouraged to cheat for volume anymore and it would still encourage a baseline of good activity on these vital networks and you could keep getting useful Awario data.

Full list of networks that can be tracked with Awario: Twitter Facebook Instagram YouTube Reddit News/blogs Websites (since the Reach calculation was the issue with this one, we could bring it back and track as on or zero)

We could run the numbers in this model for the upcoming period and see what you guys think - what networks would you think we should watch? I'm inclined to try all of them...

polluterofminds commented 5 years ago

I am 100% in support of this model. While I absolutely refuse to use Facebook for anything, I'll take the hit happily if this is the model. Plus even if devs choose not to engage on certain platforms, just by the nature of how marketing works, it's likely that effort elsewhere will be shared anyway on the platform the dev has chosen not to engage with.

cuevasm commented 5 years ago

Haha totally fair on Facebook! I think we should decide as a community which of these we want to include, this is just the full set Awario currently offers for your consideration.

dantrevino commented 5 years ago

I like this. This could end up being a flywheel for social networks on blockstack as well. We should include Debut, Awario, Decentus, PDen, Fupio, if possible.

yw @kkomaz

Walterion01 commented 5 years ago

@cuevasm I respect your attention to improve this program but let me disagree with you as a tryout. You are right about it does not encourage anyone for volume anymore, because it is so much easy (also for us) to call it a score.

What will measure in this way? Just posting one thing or achieving a base limit? Anyway, I think it will help to create many low and medium quality apps and also increase apps count, although I don't know your priority in this matter, quality or quantity. As I said here too, I think with these changes, everything seems too comfy for people who want to make Next Level of a powerful world.

Here is my suggestion: Keep Awario as is and just try to make it better, just a sample of corruption should not lead to changing these hard works. As you can see in #127 it will lead to some "very quiet" or "dead" app to be in the comparison. The growth and log that @hstove did, keeps most of the things in place.

cuevasm commented 5 years ago

@Walterion1 what it measures is a baseline of activity that we know to be helpful in growing awareness for an application. What it helps to separate are teams that are actively trying to acquire users (which everyone should be doing early on) from those that are not, which I do think is an important distinction. If you're doing NO marketing (or generating any results) on ANY of these networks, it would be pretty hard to be successful.

While I agree that the growth and log scores do their job effectively and the current problem is probably somewhat overblown in terms of actual impact, it seems the majority here disagrees and this won't be acceptable, unfortunately. I have obviously been in the camp of trying to keep Awario and have noted on several occasions that any long-term 'cheating' like we saw this month would be pretty impossible to effectively maintain, but the math here doesn't seem to matter - it's more that folks don't want to see that kind of activity around Blockstack (which I totally understand).

The latest model proposal more simply encourages organic activity (because there is no reward for pumping Reach). Those that increase their surface area i.e. get coverage on multiple networks, would stand to benefit.

And to #127, I believe this model is inline with that suggestion actually, just a little more nuanced. This new Awario score would easily separate out zombie apps as an added benefit, from there we can decide if we really want to make them ineligible or just let the score take care of bumping them down the list naturally.

Walterion01 commented 5 years ago

I can understand what you want to achieve, but this base line should be low enough to pass many ones and as I said, this will lead to many low and medium apps, as it is easy to get the score.

This new plan will be easy to game too, someone like @jehunter5811 will put a lot of time to write a useful blog post or make a YouTube video, but someone else will just copy useless info to just get the baseline.

The point is, changing the system to get score easier is not an answer to prevent corruption, it is just weakening the score so it loses the worth of cheating.

My suggestion is, as you stated before, filter the results and release the data to audit, if you missed something, I'm sure people will say something.

cuevasm commented 5 years ago

I understand your point, but I disagree that it weakens it. Here's why:

What really weakens it is if, like in your example, someone like Justin creates a nice, thoughtful post. This gets a Reach score of 1000. Someone else does a Twitter bot farm, gets Reach of 4,000,000. This is the weaker score because it's misrepresentative of the quality, effort, and effectiveness. Reach is a measure of effectiveness and if we can't accurately measure that effectiveness because of the noise in the system and the disagreements over what is fair game, then this it is a weak score.
I don't believe it is weaker at all, just different. Instead of measuring the results, we're measuring the activity, which is ultimately what we want to encourage anyway. Sure, there is less opportunity to differentiate yourself here, but consider NIL, which checks for a baseline of important things. Many apps receive the same scores, the requirements there are simple and small too, yet we've deemed these things important enough to check for. I believe some semblance of web presence is another of those things and if we can't fully measure a great Reach, we should then look to encourage what we can fairly detect at scale.

There will certainly be some folks that do the bare minimum, but again, this is better than zero activity. And to your first point, I'm actually ok with 'low and medium' apps being able to score well on Awario too. If they are doing some level of marketing, that's good for the whole Blockstack ecosystem. Also, you'd be surprised how many apps do nothing or only place on one network, I think you'll find that the apps you think should be at the top of this, will be. I'll run the numbers sometime in the next week or so so that we can see.

Last, I would generally say that it's a good thing people can't differentiate themselves to a great degree by doing well with any one App Reviewer and this model certainly accomplishes that.

(And I'm not suggesting changing it to make it easier, in fact, this will be quite a lot more work for me because of the way Awario is laid out - I am suggesting this because I think it's a viable way to keep a very useful App Reviewer in the mix and accomplishes the spirit of what we set out to do here with awareness)

Walterion01 commented 5 years ago

Thank you for thoughtfull explanation. I agree with a test, it will show better how it goes. But please consider what happend with NIL, almost all get 4 for Blockstack, a useless score because Auth is the key feature to the whole contest. And almost all get 1 for Gaia, because it is "hard" for most of the developers to make a more decentralized appas I talked with Larry. So now after a couple of months, NIL value lost much of the effect to differ between a decentralized app to an ecrypted one, and I call this weakening a good reviewer. In the end we are here to improve a new way of development and I trust your judgement and hard work. Hope my comments help that.

polluterofminds commented 5 years ago

And almost all get 1 for Gaia, because it is "hard" for most of the developers to make a more decentralized app.

I don't want to derail the conversation here, but I think this comment from @Walterion1 actually proves why Mitchell's suggested method makes sense. NIL is not meaningless. It very specifically rewards the developers who put in the time to build their apps using the tools available to make them as decentralized and user-controlled as possible. So, an app that chooses not to implement Gaia storage should absolutely get a lower score than an app that does. The same applies to Awario. If an app does any sort of marketing, they should get credit. If they don't, they shouldn't.

I'd also just like to be very clear about this whole idea that gaming the system doesn't have a big impact long-term (in reference to Hank's comment ). Sure, a single app can't benefit long-term, but this type of behavior will negatively impact every other app in perpetuity as new apps come on board and can boost their "reach." That is why it's so critical that this whole issue is solved and why it is not overblown.

polluterofminds commented 5 years ago

Also, I do think we need far more differentiation on app reviewers. But it seems the community has stopped proposing new app reviewers. I've started trying to reinvigorate that.

GinaAbrams commented 4 years ago

Discussion ongoing in #135. Closing to keep the discussion in that thread.

stacks-archive / app-mining

Immediately suspend Awario scoring #120