Closed cuevasm closed 5 years ago
@cuevasm Can you provide a bit more information on how the binary network scores work?
The model is here in the ticket linked above - https://github.com/blockstack/app-mining/issues/120#issuecomment-507804626
Pasting for convenience:
Thinking about an Awario model wherein we don't measure Reach, but we just give out zeroes and ones for each network that's tracked. It's like, there's activity or there's not on Reddit, Twitter, blogs/news. No one would be encouraged to cheat for volume anymore and it would still encourage a baseline of good activity on these vital networks and you could keep getting useful Awario data.
Full list of networks that can be tracked with Awario: Twitter Facebook Instagram YouTube Reddit News/blogs Websites (since the Reach calculation was the issue with this one, we could bring it back and track as on or zero)
The end result for now is that you get some number out of 6 (1 possible for each network). As noted, we could add back in 'Web' mentions since the only reason we excluded it was the wonky Reach scores which wouldn't be a factor in this model.
The end result for now is that you get some number out of 6 (1 possible for each network). As noted, we could add back in 'Web' mentions since the only reason we excluded it was the wonky Reach scores which wouldn't be a factor in this model.
Yep, this sounds good, and we would convert this to a z-score as your final "Awario" score. The likely result of this is that most apps end up getting a "perfect score" here, which would actually not improve or hurt your score, even if you did 6/6. In my opinion, this is a great result, as it means you need to be active on different platforms as a baseline for app mining.
The likely result of this is that most apps end up getting a "perfect score" here, which would actually not improve or hurt your score
I call this weakening the reviewer. This way, only "bad" ones will be found, and we miss the "better" ones. This is like we say that for PH, just launch gives you 1, no need to make a good app that seems useful, have a good UX or a nice design. What about being better and what is the difference between some apps that are better or work harder? If the PBC wants to promote keeping a base standard and not being better or best, I understand. My questions are to find out what is the target so we can hit it better.
If we can come up with a good definition of "better in reach" then we should adapt the scoring. PBC would be happy! So far, there is no feasible proposition.
We have also binary scores for NIL because this gives a good start to evaluate the right protection. For PH, we trust that PH gives indeed a better score for better apps. For Awario, this is not the case at the moment.
There are more proposals like #98 that tries to defines more criteria for better apps and we as a community should continue do improve this.
Maybe getting impression from Twitter itself? A tool can be made if PBC gets an enterprise account from Twitter.
The problem arises with some component.. my suggestion is lets reduce the weightage given to reach and also add the new model as described by Mitchell. for ex: my suggestion would be (0.75reach(current methodology)+0.25New model).Completely removing reach means, it will be another thing where one will look only once in a month and not do much which i feel will not be good in the long term. Binary system weakens the weightage of Awario and ranking boils down to just two reviewers one of them being product hunt.. as far as issues in reach are concerned, i guess reducing the weightage would help.
I really don't see it as weakening as it makes it much more fair and the results clear. The point of the reviewer was to encourage applications to market themselves earlier on and to begin considering user acquisition and awareness. Taking Reach out and measuring this way still accomplishes that. The only real difference is that one app can't massively differentiate themselves through this score, but I'm starting to look at that as a positive in the long-run. I want to keep things accessible and exciting for new apps, that as time goes on, would never be able to compete on Reach with the well-established apps. The fact of the matter is, if all apps did a small amount of marketing on each network, this would be a big net positive for the community and each individual app. I also think too many are assuming everyone is going to get 6/6 and that will quickly be shown to not be true.
And RE: an enterprise Twitter account: Any tool, even one pulled directly off Twitter will be gameable in the same way as Awario is currently being gamed (in fact, it would be worse, because the API is just an unfiltered firehose whereas Awario has some built-in filter logic already). Worse, it dictates that you focus your marketing efforts on Twitter, which may not even be the right choice for your business. You might be better suited to focus on Reddit, for example. This model wouldn't punish you for choosing to focus where you are getting the most lift, assuming you make the time to do a few posts on the other networks, which should be easy. I like that the binary score is agnostic to network because each app's audiences are unique and you should all be emphasizing one or another more than some others based on results.
And Friedger, I love the spirit of what you're saying, but personally, I think 'better reach' is near-impossible to measure, and even if we did, we've shown we wouldn't agree on what better is.
Last, this binary model would make Awario, in my opinion, the most objective App Reviewer. I think with so many subjective pieces we've accepted in other App Reviewers, having one that is purely based on measurable, reliable outputs, is very positive.
You are on point about keeping things exciting for new apps. I totally disagree with the Twitter enterprise. As you can check your own tweets, Twitter will only count real views. Otherwise, your Twitter timeline would be filled with chats from bots. After all, they are in the business of Social for years. The suggested way will be easily gameable with scheduling posts in Hootsuite with the repetitive content for every month in all platforms.
Let me meet you halfway in a new proposal: We can use Awario reach as a raking value, and use this rank as Score for each platform separately, and then average all. So each app can compete in its beneficial platform (e.g., Fashion app in Instagram and Dev tool in Twitter). Extra: You can even add a factor for each network separately (e.g., 1.3x for Twitter and 0.9x for web).
Why just do not add followers/following ratio for calculating Reach of tweets? It is less than 1 for all bots and fake profiles.
"Real views" on twitter is not a thing @Walterion1 - Awario is pulling those scores off the API and it's shown to be very fallible. It's too easy to trick with the kinds of things that have already been done to game the system.
Why just do not add followers/following ratio for calculating Reach of tweets? It is less than 1 for all bots and fake profiles.
This isn't that reliable of a way of determining if an account is legitimate or not. All that needs to be done is a more expensive service is paid with bots or 'users' that have good ratios.
Let me meet you halfway in a new proposal: We can use Awario reach as a raking value, and use this rank as Score for each platform separately, and then average all. So each app can compete in its beneficial platform (e.g., Fashion app in Instagram and Dev tool in Twitter). Extra: You can even add a factor for each network separately (e.g., 1.3x for Twitter and 0.9x for web).
Could you describe this further, I'm not quite following
"Real views" on twitter is not a thing
Can I ask if you know what is the "Impressions" count when you see a "Tweet Analytics" of a tweet? Twitter gives this value for a tweet, and it is not a follower count of the person like Awario.
Sure, let's go with an example:
App | Platform1Reach | Platform1Rank | Platfrom2Reach | Platform2Rank | Result |
---|---|---|---|---|---|
A1 | 13,000 | 2 | 4,900 | 3 | 1/(2+3)=0.2 |
A2 | 7,000 | 3 | 10,000 | 1 | 1/(3+1)=0.25 |
A3 | 20,000 | 1 | 5,000 | 2 | 1/(1+2)=0.33 |
So A3 is the best app in total. It can be more complicated, but I think it gives you the idea thaw how we can use the rank instead of the reach or binary.
I don't see why any of our blockstack apps need to have an account on facebook/youtube/instagram. Just because we can track something with awario, doesn't mean we should.
I call this weakening the reviewer. This way, only "bad" ones will be found, and we miss the "better" ones.
Binary system weakens the weightage of Awario and ranking boils down to just two reviewers one of them being product hunt.. as far as issues in reach are concerned, i guess reducing the weightage would help
I really don't see it as weakening as it makes it much more fair and the results clear. The point of the reviewer was to encourage applications to market themselves earlier on and to begin considering user acquisition and awareness.
I hear what @Walterion1 and @vishnuraghunathan are trying to say and it's a very valid point that completely taking reach out of the score in fact weakens awario as an app reviewer. The purpose of a scoring mechanism in any ranking system is to introduce a distribution curve so that participants can be ranked against each other corresponding to their efforts. Binary scoring system in its current form (dry run) will just result in two or three segments of apps and the main challenge would be to rank apps within these segments. It just reduces awario to an eligibilty criteria rather than a score. If that happens ultimately the rankings would be almost dependent on PH and TMUI. In my opinion our best foot forward should be mix of reach, growth and binary scoring and taking a z score in each of these categories.
If that happens ultimately the rankings would be almost dependent on PH and TMUI.
And we maybe have a problem with apps getting score because of fake PH votes too. #134
It seems @Walterion1 changed his mind.
On issue #130 he complained about Awario reach of Pden:
"Mentioning a big name for support, not awareness, and not answering back: https://twitter.com/buffer/status/1143153674040029184 ."
Now, he is doing exactly the same with his account: https://twitter.com/RockstarSupport/status/1156583214649155584
@Walterion1, is this the reason why now you are supporting using Twitter Impressions instead of Awario reach? I'm pretty sure Twitter impressions will be big for this tweet as well. So basically you are suggesting something that will not solve what you just did, right?
Nice of you to care and check the tweets, thanks. And I didn't change my mind. As I mentioned in that issue, I don't like it, but I'm not against it, neither the community. I dont want to see such behavior, and that is the reason I care to analyze them, but after all, I should play by the rules, and this tweet is to show a point that everyone can do stuff like this. I'm sure you will notice that our Twitter is one of the most active accounts, so no matter what changes, we will be in a good position. But as a person who worked hard to make a good app like many ones hear, I will try not to let decaying it with buying votes, reaches or whatsoever.
About your second question: No.
I'm leaning towards 2), Awario with policing, but with harsher rules.
Spam, scams and paid bots not only harm the app that does it, but also the whole Blockstack ecosystem at this stage. We can't force apps not to do it, but we can avoid incentivising and paying them to do so.
I propose we maintain a list of bad practices, and if one app gets caught doing them, they're out of App Mining. First strike they get a warning, second strike they're out of the program. The community can keep an eye on each other, from what it seems. The App Mining team just needs to confirm the bad practice and issue the warnings.
Suggestion of practices to blacklist:
Want to keep doing those practices? No problem, you're just out of the App Mining program. This way we can keep incentivising apps to do marketing and get more reach, while keeping the gaming incentive low.
I propose we maintain a list of bad practices, and if one app gets caught doing them, they're out of App Mining. First strike they get a warning, second strike they're out of the program. The community can keep an eye on each other, from what it seems. The App Mining team just needs to
There's a huge problem with this, I've run contests before with similar rules, and we had folks 'cheat' on behalf of others. Meaning, they went and bought twitter bot impressions for another brand in the contest to make it seem as though they cheated. How could anyone possibly know who paid for what? And then we would be banning people based on that? I don't think that works.
Also, as I mentioned on the call after a month of policing and talking with some of the Miners involved in activity others want policed, I don't think I can ever keep up with the list of not allowed stuff as they will just come up with more ways to game it (and have promised to do so). It's basically a race toward who can cheat best and that's not what anyone wants.
@cuevasm ok then, if we're assuming the worst then only the binary option remains. Although it's measuring something very different. But then shouldn't we by applying the same reasons to Product Hunt. Doesn't it suffer from the same issues?
100% in support of option 3.
I think option 3 makes the most sense as well and will scale.
Is there any decision regarding this? I would also point that some app miners are launching several apps using the same brand/domain, and it would be great to have this clarified to avoid future problems. I just want to make sure they will not have a 1/n effort to get the same results.
Mitchell has been researching alternative methods and tools to utilize. He is writing up a new proposal now.
It seems that we have support for a binary method here, but it's not overwhelming. After a lot of discussions with you all, the team here, Awario, etc., we've come up with a combination of the binary method with the Reach method we think maintains the best spirit of the App Reviewer and doesn't suffer from the same gameability issues we've seen so far.
Here it is, I am dubbing it Blended Awareness:
Binary scoring on social networks. i.e. you get a number out of 5 max (1 for each network, Facebook, Twitter, YouTube, Reddit, and Instagram). To get 5 points, all you need to do is register a Mention in that month on each network.
plus+
A Reach score for any Mentions in the News/Blog section. The Reach will have log10 scoring applied to it.
plus+
Growth score as we've been doing, this will take your last month's Reach and calculate the Growth % over the current month.
Why does this work better?
Potential caveats
Still to solve Exact weight of each piece of the score, will leave this to @hstove to propose. My general proposal is that each portion be 1/3 of your overall score, Social + Reach + Growth.
Next steps Show up to the App Mining call on Thursday (22) with any questions or concerns, we should try to ratify this proposal then so that we can give you time to change up your methods before the month clicks over into September. We'd like to change over to this method as quickly as possible so we can end the current policing method.
Is there any decision regarding this? I would also point that some app miners are launching several apps using the same brand/domain, and it would be great to have this clarified to avoid future problems. I just want to make sure they will not have a 1/n effort to get the same results.
Different apps can't be named the exact same. They can use say Arcane Docs vs. Arcane Sheets. The Awario query is easily able to differentiate them. We have already been doing this successfully with OI Chat, OI Timesheet, etc.
Please see the proposal and next steps above.
Blended Awareness sound like a good compromise. @cuevasm For Growth please include #129
https://github.com/blockstack/app-mining/issues/135#issuecomment-523136444
There were no major objections to this on the call today, please put yours here if you have them. We can proceed with this scoring method right away or do a dry run first, please use 👀 to vote for doing a dry run first, use 🚀 to vote for switching right away and getting your October (data from Sept 1-30) score with the new method. The current scoring method will be used to score for September regardless because we're already in the data period (August 1-31).
Basically, you are deciding if you're comfortable adapting your strategy to this new scoring method by September 1 OR you want to wait to change the way you do awareness building until Oct 1.
@cuevasm is it just me, or you can't react with a 🐪? I used the 👀 instead for the Dry Run.
Ah strange, didn't realize you couldn't, sorry, let's do the eyes! Updating
@cuevasm I think there should be a dry run because:
we don't have a test result for this month with exact results so we can estimate how it will play out
Well, you do have access to Awario, right? So you could see what your apps might look like.
I'm confused, Mitchell said that if we went fast, you'd get the new score in October, but wouldn't moving fast mean we'd use the new score in September, using August data? Maybe I read that wrong.
Mitchel:
Still to solve Exact weight of each piece of the score, will leave this to @hstove to propose. My general proposal is that each portion be 1/3 of your overall score, Social + Reach + Growth.
Well, you do have access to Awario, right? So you could see what your apps might look like.
Not accurately as I dont know the exact weight for each one.
@hstove we haven't got our Awario access yet, and we already participated for 2 months now.
@cuevasm sounds like some app miners still don't have Awario access, please follow up.
Current state : 7 vs. 4 for fast adaption (October rewards)
I don't see any reason why the adaption should not happen now. We are all in the same boat, it is fair, and it can get only better.
Thanks for all the discussion. The default protocol is to use a dry run to avoid any complications and allow App Miners to prepare. Next month we will continue with Awario as is, and will prepare a dry run for the blended proposal described here: https://github.com/blockstack/app-mining/issues/135#issuecomment-523136444
Assuming the dry run results work well, that means App Miners should begin preparing your news reach for Sept live results.
I don't see why the dry run on August data will be different than on September data.
App publishers should continue to publish their apps in the best way they can, only rewards will be distributed in a fairer way.
@hstove we haven't got our Awario access yet, and we already participated for 2 months now.
All Miners were provided Awario access weeks ago, please look for an invite from Awario by searching your inbox. You'll have been invited using the email you receive regular communication from us to.
Based on the voting, we'll do a dry run first, so I'll provide that data alongside the regular score for the upcoming rankings.
I don't see why the dry run on August data will be different than on September data.
App publishers should continue to publish their apps in the best way they can, only rewards will be distributed in a fairer way.
Should, sure. However, I think it would be pretty different given where we're starting from. If we change to this method, different things are rewarded and we'd expect behavior to change. If we are rewarding press mentions, then you need time to get press mentions. Instead of focusing on differentiating on social media as most are now because that's the easiest/best place to do so, App Miners would be focusing on differentiating by getting more play in the press - this takes some time.
Still, the dry run data will be the same for August and September because in both month differentiating on social media is rewarded. Hence, the reason to not change quicker is to let prepare app miners to get better results. For me, that is not reason enough. We should adapt the algorithms as soon as a fix for a flaw is identified.
I can appreciate that, but we've received consistent feedback that changing the rules in the middle of a work period is not desirable.
We just missed the September period.
Hey everyone, just to put a bow on this, here's the scoring timeline for Awario going forward. We'll be switching to the Blended Awareness model shortly.
September Run (data from August): Current Method October Run (data from September): Current Method November Run (data from October): New Method (Blended Awareness)
We'll remind you and cover in the next call as well as by email and in the changelog.
The take away from this message is in 2 weeks we move into October, the data for which will be judged as outlined above for your eventual November score. You may decide to change your strategy, you may not, but the update is coming so please be aware!
@cuevasm Could you please consider #129 when implementing Blended Awareness?
I'll forego the format given there are numerous tickets on this already. In continuing with Awario, we have 3 options (of course we can iterate on any of these going forward).
Related:
1) Awario with no policing whatsoever. Let people do what they like in terms of marketing, spamming, influencer based stuff, etc. Do this knowing that this will be an increasingly expensive and ultimately unsustainable model for them, given the Growth score factor, though they can manage a few months of pumped numbers on their score from the Awario result.
2) Awario with policing. This would require a concrete set of rules (which seems difficult to create in and of itself, given the different perspectives in the community) and some kind of scalable audit process by which volunteers mark and remove Mentions according to the rules. Do this knowing that some are committed to staying just ahead of the rules in order to game the system, it would be somewhat a game of whack-a-mole and get increasingly time-consuming with things inevitably slipping through.
3) Awario with binary network scores. This would leverage the same tracking system and instead of looking to key off the Reach generated, we would simply be looking to see if the app was doing at least a small amount of public activity on online and on social networks, thereby encouraging that activity, but not incentivizing Reach pumping. The dry-run data for this is available here.
At which point the community has decided which one of these, we'll then need to decide when we want to this to start. There are competing schools of thought here, some say we shouldn't make a change in the period, others want to agree and iterate faster. The data is all being captured regardless, so I can spin back both versions next week (as July will be over).
In any case, let's settle on the one we want to go with for now and go from there.