Awario Scoring Recap & Proposal (July 31)

cuevasm commented 5 years ago

I'll forego the format given there are numerous tickets on this already. In continuing with Awario, we have 3 options (of course we can iterate on any of these going forward).

1) Awario with no policing whatsoever. Let people do what they like in terms of marketing, spamming, influencer based stuff, etc. Do this knowing that this will be an increasingly expensive and ultimately unsustainable model for them, given the Growth score factor, though they can manage a few months of pumped numbers on their score from the Awario result.

2) Awario with policing. This would require a concrete set of rules (which seems difficult to create in and of itself, given the different perspectives in the community) and some kind of scalable audit process by which volunteers mark and remove Mentions according to the rules. Do this knowing that some are committed to staying just ahead of the rules in order to game the system, it would be somewhat a game of whack-a-mole and get increasingly time-consuming with things inevitably slipping through.

3) Awario with binary network scores. This would leverage the same tracking system and instead of looking to key off the Reach generated, we would simply be looking to see if the app was doing at least a small amount of public activity on online and on social networks, thereby encouraging that activity, but not incentivizing Reach pumping. The dry-run data for this is available here.

At which point the community has decided which one of these, we'll then need to decide when we want to this to start. There are competing schools of thought here, some say we shouldn't make a change in the period, others want to agree and iterate faster. The data is all being captured regardless, so I can spin back both versions next week (as July will be over).

In any case, let's settle on the one we want to go with for now and go from there.

sdsantos commented 5 years ago

@cuevasm Can you provide a bit more information on how the binary network scores work?

cuevasm commented 5 years ago

The model is here in the ticket linked above - https://github.com/blockstack/app-mining/issues/120#issuecomment-507804626

Pasting for convenience:

Thinking about an Awario model wherein we don't measure Reach, but we just give out zeroes and ones for each network that's tracked. It's like, there's activity or there's not on Reddit, Twitter, blogs/news. No one would be encouraged to cheat for volume anymore and it would still encourage a baseline of good activity on these vital networks and you could keep getting useful Awario data.

Full list of networks that can be tracked with Awario: Twitter Facebook Instagram YouTube Reddit News/blogs Websites (since the Reach calculation was the issue with this one, we could bring it back and track as on or zero)

The end result for now is that you get some number out of 6 (1 possible for each network). As noted, we could add back in 'Web' mentions since the only reason we excluded it was the wonky Reach scores which wouldn't be a factor in this model.

hstove commented 5 years ago

The end result for now is that you get some number out of 6 (1 possible for each network). As noted, we could add back in 'Web' mentions since the only reason we excluded it was the wonky Reach scores which wouldn't be a factor in this model.

Yep, this sounds good, and we would convert this to a z-score as your final "Awario" score. The likely result of this is that most apps end up getting a "perfect score" here, which would actually not improve or hurt your score, even if you did 6/6. In my opinion, this is a great result, as it means you need to be active on different platforms as a baseline for app mining.

Walterion01 commented 5 years ago

The likely result of this is that most apps end up getting a "perfect score" here, which would actually not improve or hurt your score

I call this weakening the reviewer. This way, only "bad" ones will be found, and we miss the "better" ones. This is like we say that for PH, just launch gives you 1, no need to make a good app that seems useful, have a good UX or a nice design. What about being better and what is the difference between some apps that are better or work harder? If the PBC wants to promote keeping a base standard and not being better or best, I understand. My questions are to find out what is the target so we can hit it better.

friedger commented 5 years ago

If we can come up with a good definition of "better in reach" then we should adapt the scoring. PBC would be happy! So far, there is no feasible proposition.

We have also binary scores for NIL because this gives a good start to evaluate the right protection. For PH, we trust that PH gives indeed a better score for better apps. For Awario, this is not the case at the moment.

There are more proposals like #98 that tries to defines more criteria for better apps and we as a community should continue do improve this.

Walterion01 commented 5 years ago

Maybe getting impression from Twitter itself? A tool can be made if PBC gets an enterprise account from Twitter.

vishnuraghunathan commented 5 years ago

The problem arises with some component.. my suggestion is lets reduce the weightage given to reach and also add the new model as described by Mitchell. for ex: my suggestion would be (0.75reach(current methodology)+0.25New model).Completely removing reach means, it will be another thing where one will look only once in a month and not do much which i feel will not be good in the long term. Binary system weakens the weightage of Awario and ranking boils down to just two reviewers one of them being product hunt.. as far as issues in reach are concerned, i guess reducing the weightage would help.

cuevasm commented 5 years ago

I really don't see it as weakening as it makes it much more fair and the results clear. The point of the reviewer was to encourage applications to market themselves earlier on and to begin considering user acquisition and awareness. Taking Reach out and measuring this way still accomplishes that. The only real difference is that one app can't massively differentiate themselves through this score, but I'm starting to look at that as a positive in the long-run. I want to keep things accessible and exciting for new apps, that as time goes on, would never be able to compete on Reach with the well-established apps. The fact of the matter is, if all apps did a small amount of marketing on each network, this would be a big net positive for the community and each individual app. I also think too many are assuming everyone is going to get 6/6 and that will quickly be shown to not be true.

And RE: an enterprise Twitter account: Any tool, even one pulled directly off Twitter will be gameable in the same way as Awario is currently being gamed (in fact, it would be worse, because the API is just an unfiltered firehose whereas Awario has some built-in filter logic already). Worse, it dictates that you focus your marketing efforts on Twitter, which may not even be the right choice for your business. You might be better suited to focus on Reddit, for example. This model wouldn't punish you for choosing to focus where you are getting the most lift, assuming you make the time to do a few posts on the other networks, which should be easy. I like that the binary score is agnostic to network because each app's audiences are unique and you should all be emphasizing one or another more than some others based on results.

And Friedger, I love the spirit of what you're saying, but personally, I think 'better reach' is near-impossible to measure, and even if we did, we've shown we wouldn't agree on what better is.

Last, this binary model would make Awario, in my opinion, the most objective App Reviewer. I think with so many subjective pieces we've accepted in other App Reviewers, having one that is purely based on measurable, reliable outputs, is very positive.

Walterion01 commented 5 years ago

You are on point about keeping things exciting for new apps. I totally disagree with the Twitter enterprise. As you can check your own tweets, Twitter will only count real views. Otherwise, your Twitter timeline would be filled with chats from bots. After all, they are in the business of Social for years. The suggested way will be easily gameable with scheduling posts in Hootsuite with the repetitive content for every month in all platforms.

Walterion01 commented 5 years ago

Let me meet you halfway in a new proposal: We can use Awario reach as a raking value, and use this rank as Score for each platform separately, and then average all. So each app can compete in its beneficial platform (e.g., Fashion app in Instagram and Dev tool in Twitter). Extra: You can even add a factor for each network separately (e.g., 1.3x for Twitter and 0.9x for web).

qqnoname commented 5 years ago

Why just do not add followers/following ratio for calculating Reach of tweets? It is less than 1 for all bots and fake profiles.

cuevasm commented 5 years ago

"Real views" on twitter is not a thing @Walterion1 - Awario is pulling those scores off the API and it's shown to be very fallible. It's too easy to trick with the kinds of things that have already been done to game the system.

cuevasm commented 5 years ago

Why just do not add followers/following ratio for calculating Reach of tweets? It is less than 1 for all bots and fake profiles.

This isn't that reliable of a way of determining if an account is legitimate or not. All that needs to be done is a more expensive service is paid with bots or 'users' that have good ratios.

cuevasm commented 5 years ago

Let me meet you halfway in a new proposal: We can use Awario reach as a raking value, and use this rank as Score for each platform separately, and then average all. So each app can compete in its beneficial platform (e.g., Fashion app in Instagram and Dev tool in Twitter). Extra: You can even add a factor for each network separately (e.g., 1.3x for Twitter and 0.9x for web).

Could you describe this further, I'm not quite following

Walterion01 commented 5 years ago

"Real views" on twitter is not a thing

Can I ask if you know what is the "Impressions" count when you see a "Tweet Analytics" of a tweet? Twitter gives this value for a tweet, and it is not a follower count of the person like Awario.

Walterion01 commented 5 years ago

Sure, let's go with an example:

App	Platform1Reach	Platform1Rank	Platfrom2Reach	Platform2Rank	Result
A1	13,000	2	4,900	3	1/(2+3)=0.2
A2	7,000	3	10,000	1	1/(3+1)=0.25
A3	20,000	1	5,000	2	1/(1+2)=0.33

So A3 is the best app in total. It can be more complicated, but I think it gives you the idea thaw how we can use the rank instead of the reach or binary.

ghost commented 5 years ago

I don't see why any of our blockstack apps need to have an account on facebook/youtube/instagram. Just because we can track something with awario, doesn't mean we should.

sdeepak23 commented 5 years ago

I call this weakening the reviewer. This way, only "bad" ones will be found, and we miss the "better" ones.

Binary system weakens the weightage of Awario and ranking boils down to just two reviewers one of them being product hunt.. as far as issues in reach are concerned, i guess reducing the weightage would help

I really don't see it as weakening as it makes it much more fair and the results clear. The point of the reviewer was to encourage applications to market themselves earlier on and to begin considering user acquisition and awareness.

I hear what @Walterion1 and @vishnuraghunathan are trying to say and it's a very valid point that completely taking reach out of the score in fact weakens awario as an app reviewer. The purpose of a scoring mechanism in any ranking system is to introduce a distribution curve so that participants can be ranked against each other corresponding to their efforts. Binary scoring system in its current form (dry run) will just result in two or three segments of apps and the main challenge would be to rank apps within these segments. It just reduces awario to an eligibilty criteria rather than a score. If that happens ultimately the rankings would be almost dependent on PH and TMUI. In my opinion our best foot forward should be mix of reach, growth and binary scoring and taking a z score in each of these categories.

Walterion01 commented 5 years ago

If that happens ultimately the rankings would be almost dependent on PH and TMUI.

And we maybe have a problem with apps getting score because of fake PH votes too. #134

ViniciusBP commented 5 years ago

It seems @Walterion1 changed his mind.

On issue #130 he complained about Awario reach of Pden:

"Mentioning a big name for support, not awareness, and not answering back: https://twitter.com/buffer/status/1143153674040029184 ."

Now, he is doing exactly the same with his account: https://twitter.com/RockstarSupport/status/1156583214649155584

@Walterion1, is this the reason why now you are supporting using Twitter Impressions instead of Awario reach? I'm pretty sure Twitter impressions will be big for this tweet as well. So basically you are suggesting something that will not solve what you just did, right?

Walterion01 commented 5 years ago

Nice of you to care and check the tweets, thanks. And I didn't change my mind. As I mentioned in that issue, I don't like it, but I'm not against it, neither the community. I dont want to see such behavior, and that is the reason I care to analyze them, but after all, I should play by the rules, and this tweet is to show a point that everyone can do stuff like this. I'm sure you will notice that our Twitter is one of the most active accounts, so no matter what changes, we will be in a good position. But as a person who worked hard to make a good app like many ones hear, I will try not to let decaying it with buying votes, reaches or whatsoever.

About your second question: No.

sdsantos commented 5 years ago

I'm leaning towards 2), Awario with policing, but with harsher rules.

Spam, scams and paid bots not only harm the app that does it, but also the whole Blockstack ecosystem at this stage. We can't force apps not to do it, but we can avoid incentivising and paying them to do so.

I propose we maintain a list of bad practices, and if one app gets caught doing them, they're out of App Mining. First strike they get a warning, second strike they're out of the program. The community can keep an eye on each other, from what it seems. The App Mining team just needs to confirm the bad practice and issue the warnings.

Suggestion of practices to blacklist:

Paid product hunt votes (credible votes less than 2/3 of the total, or any other proof)
Paid twitter bots
The behaviour already called out "Mentioning a big name for support, not awareness, and not answering back"
Not fully disclosing data collected
Selling out user data tracked by the app

Want to keep doing those practices? No problem, you're just out of the App Mining program. This way we can keep incentivising apps to do marketing and get more reach, while keeping the gaming incentive low.

cuevasm commented 5 years ago

I propose we maintain a list of bad practices, and if one app gets caught doing them, they're out of App Mining. First strike they get a warning, second strike they're out of the program. The community can keep an eye on each other, from what it seems. The App Mining team just needs to

There's a huge problem with this, I've run contests before with similar rules, and we had folks 'cheat' on behalf of others. Meaning, they went and bought twitter bot impressions for another brand in the contest to make it seem as though they cheated. How could anyone possibly know who paid for what? And then we would be banning people based on that? I don't think that works.

Also, as I mentioned on the call after a month of policing and talking with some of the Miners involved in activity others want policed, I don't think I can ever keep up with the list of not allowed stuff as they will just come up with more ways to game it (and have promised to do so). It's basically a race toward who can cheat best and that's not what anyone wants.

sdsantos commented 5 years ago

@cuevasm ok then, if we're assuming the worst then only the binary option remains. Although it's measuring something very different. But then shouldn't we by applying the same reasons to Product Hunt. Doesn't it suffer from the same issues?

polluterofminds commented 5 years ago

100% in support of option 3.

jyudkin1 commented 5 years ago

I think option 3 makes the most sense as well and will scale.

ViniciusBP commented 5 years ago

Is there any decision regarding this? I would also point that some app miners are launching several apps using the same brand/domain, and it would be great to have this clarified to avoid future problems. I just want to make sure they will not have a 1/n effort to get the same results.

stackatron commented 5 years ago

Mitchell has been researching alternative methods and tools to utilize. He is writing up a new proposal now.

cuevasm commented 5 years ago

It seems that we have support for a binary method here, but it's not overwhelming. After a lot of discussions with you all, the team here, Awario, etc., we've come up with a combination of the binary method with the Reach method we think maintains the best spirit of the App Reviewer and doesn't suffer from the same gameability issues we've seen so far.

Here it is, I am dubbing it Blended Awareness:

Binary scoring on social networks. i.e. you get a number out of 5 max (1 for each network, Facebook, Twitter, YouTube, Reddit, and Instagram). To get 5 points, all you need to do is register a Mention in that month on each network.

plus+

A Reach score for any Mentions in the News/Blog section. The Reach will have log10 scoring applied to it.

plus+

Growth score as we've been doing, this will take your last month's Reach and calculate the Growth % over the current month.

Why does this work better?

It solves the issue of social network spamming/paying/having to police, etc. Teams are not incentivized or rewarded to do it and it eliminates the time-consuming and never-ending battle of enforcing etiquette that's not wholly agreed on. If teams decide to do it, it's their own prerogative and will not benefit them in mining.
It maintains the 'strength' of the App Reviewer by still allowing teams to differentiate themselves with the Awario score, it will just have to be done through Reach generated in the News/Blog section. Since teams don't have control over the sites, it's nearly impossible for them to game the Reach score as can be done easily with social media Mentions.
It encourages teams to find valuable content partnerships, usually with publications that are actually relevant to their target market. More often than not, sites like these have editorial guidelines and publish certain types of things regularly that their audience actually engages with. It mostly prevents posts to irrelevant audiences simply because there is Reach to be gained there. Last, building relationships with publications to get ongoing placements is one of the most valuable content marketing tactics out there and this scoring method places the importance of doing this type of quality engagement much higher against the prospect of simply blasting social as the current model unintentionally disproportionately rewards.
It maintains an incentive to be on social media, which we all mostly agree is an important early channels for free/cheap user acquisition and awareness building while eliminating the arms race for paid or spam content. We've seen teams become way more active in their marketing efforts since introducing Awario, in large part on social media, so that is something we want to continue incentivizing without having it become an ugly spam war. While this may lead to a slight drop in that activity, we think it's still incentive enough that teams will execute on it.
SEO value. Getting placements and backlinks from sites like these is a longer-term value play for the app and the website/brand, one we are excited to incentivize heavier than short-term Reach spikes that we've seen the current model incentivize too heavily.

Potential caveats

We could discover some News/Blog sites that are suspicious or seem to have inflated numbers. I haven't seen this really yet, but if it does happen, my proposal is that the relative Reach of the site be verified against SimiliarWeb when needed. Assuming the numbers jive, the Reach counts. If there is no data or the data is not matching up, it should be removed and probably blacklisted from future results.
It too heavily focuses on getting press or blog coverage and that's not the right approach for some teams. Fair, but to mitigate this we've made the Reach from this log10 so that more established teams with better chances at big press hits can't easily leave the pack behind (in other words, a Forbes hit won't be so much more valuable than one in a smaller publication that new teams focused there can't still compete). We've also kept the Growth score and the Social aspects part of the overall Blended Awareness so that again, the balance is fair to new teams as well as longer-tenured ones. Last, getting backlinks and write-ups in News/Blogs that count should be valuable to every single team at almost any stage.

Still to solve Exact weight of each piece of the score, will leave this to @hstove to propose. My general proposal is that each portion be 1/3 of your overall score, Social + Reach + Growth.

Next steps Show up to the App Mining call on Thursday (22) with any questions or concerns, we should try to ratify this proposal then so that we can give you time to change up your methods before the month clicks over into September. We'd like to change over to this method as quickly as possible so we can end the current policing method.

cuevasm commented 5 years ago

Is there any decision regarding this? I would also point that some app miners are launching several apps using the same brand/domain, and it would be great to have this clarified to avoid future problems. I just want to make sure they will not have a 1/n effort to get the same results.

Different apps can't be named the exact same. They can use say Arcane Docs vs. Arcane Sheets. The Awario query is easily able to differentiate them. We have already been doing this successfully with OI Chat, OI Timesheet, etc.

Please see the proposal and next steps above.

friedger commented 5 years ago

Blended Awareness sound like a good compromise. @cuevasm For Growth please include #129

cuevasm commented 5 years ago

https://github.com/blockstack/app-mining/issues/135#issuecomment-523136444

There were no major objections to this on the call today, please put yours here if you have them. We can proceed with this scoring method right away or do a dry run first, please use 👀 to vote for doing a dry run first, use 🚀 to vote for switching right away and getting your October (data from Sept 1-30) score with the new method. The current scoring method will be used to score for September regardless because we're already in the data period (August 1-31).

Basically, you are deciding if you're comfortable adapting your strategy to this new scoring method by September 1 OR you want to wait to change the way you do awareness building until Oct 1.

sdsantos commented 5 years ago

@cuevasm is it just me, or you can't react with a 🐪? I used the 👀 instead for the Dry Run.

cuevasm commented 5 years ago

Ah strange, didn't realize you couldn't, sorry, let's do the eyes! Updating

Walterion01 commented 5 years ago

@cuevasm I think there should be a dry run because:

we don't know how it will impact the miners' behavior,
we don't have a test result for this month with exact results so we can estimate how it will play out,
one week to change the social behavior, is a little short I think.

hstove commented 5 years ago

we don't have a test result for this month with exact results so we can estimate how it will play out

Well, you do have access to Awario, right? So you could see what your apps might look like.

I'm confused, Mitchell said that if we went fast, you'd get the new score in October, but wouldn't moving fast mean we'd use the new score in September, using August data? Maybe I read that wrong.

Walterion01 commented 5 years ago

Mitchel:

Still to solve Exact weight of each piece of the score, will leave this to @hstove to propose. My general proposal is that each portion be 1/3 of your overall score, Social + Reach + Growth.

Well, you do have access to Awario, right? So you could see what your apps might look like.

Not accurately as I dont know the exact weight for each one.

sdsantos commented 5 years ago

@hstove we haven't got our Awario access yet, and we already participated for 2 months now.

stackatron commented 5 years ago

@cuevasm sounds like some app miners still don't have Awario access, please follow up.

friedger commented 5 years ago

Current state : 7 vs. 4 for fast adaption (October rewards)

I don't see any reason why the adaption should not happen now. We are all in the same boat, it is fair, and it can get only better.

stackatron commented 5 years ago

Thanks for all the discussion. The default protocol is to use a dry run to avoid any complications and allow App Miners to prepare. Next month we will continue with Awario as is, and will prepare a dry run for the blended proposal described here: https://github.com/blockstack/app-mining/issues/135#issuecomment-523136444

Assuming the dry run results work well, that means App Miners should begin preparing your news reach for Sept live results.

friedger commented 5 years ago

I don't see why the dry run on August data will be different than on September data.

App publishers should continue to publish their apps in the best way they can, only rewards will be distributed in a fairer way.

cuevasm commented 5 years ago

@hstove we haven't got our Awario access yet, and we already participated for 2 months now.

All Miners were provided Awario access weeks ago, please look for an invite from Awario by searching your inbox. You'll have been invited using the email you receive regular communication from us to.

cuevasm commented 5 years ago

Based on the voting, we'll do a dry run first, so I'll provide that data alongside the regular score for the upcoming rankings.

cuevasm commented 5 years ago

I don't see why the dry run on August data will be different than on September data.

App publishers should continue to publish their apps in the best way they can, only rewards will be distributed in a fairer way.

Should, sure. However, I think it would be pretty different given where we're starting from. If we change to this method, different things are rewarded and we'd expect behavior to change. If we are rewarding press mentions, then you need time to get press mentions. Instead of focusing on differentiating on social media as most are now because that's the easiest/best place to do so, App Miners would be focusing on differentiating by getting more play in the press - this takes some time.

friedger commented 5 years ago

Still, the dry run data will be the same for August and September because in both month differentiating on social media is rewarded. Hence, the reason to not change quicker is to let prepare app miners to get better results. For me, that is not reason enough. We should adapt the algorithms as soon as a fix for a flaw is identified.

cuevasm commented 5 years ago

I can appreciate that, but we've received consistent feedback that changing the rules in the middle of a work period is not desirable.

friedger commented 5 years ago

We just missed the September period.

cuevasm commented 5 years ago

Hey everyone, just to put a bow on this, here's the scoring timeline for Awario going forward. We'll be switching to the Blended Awareness model shortly.

September Run (data from August): Current Method October Run (data from September): Current Method November Run (data from October): New Method (Blended Awareness)

We'll remind you and cover in the next call as well as by email and in the changelog.

The take away from this message is in 2 weeks we move into October, the data for which will be judged as outlined above for your eventual November score. You may decide to change your strategy, you may not, but the update is coming so please be aware!

friedger commented 5 years ago

@cuevasm Could you please consider #129 when implementing Blended Awareness?

stacks-archive / app-mining

Awario Scoring Recap & Proposal (July 31) #135