All apps reviewed by all reviewers

stackatron commented 5 years ago

What is the problem you are seeing? Please describe. Apps that cannot be tested by the current reviewers have an unfair advantage. For example, Gladys cannot be tested by TryMyUI or NIL without hardware. Reviewing Gladys is unfair to the app builder. Not reviewing Gladys is unfair to the 60 other apps that can be reviewed.

How is this problem misaligned with goals of app mining? It simply seems unfair and implicitly incentivizes apps that cannot be tested.

What is the explicit recommendation you’re looking to propose? All apps reviewed by all reviewers.

Describe your long term considerations in proposing this change. Please include the ways you can predict this recommendation could go wrong and possible ways mitigate.

We've already tried excluding Gladys from the incompatible reviewers – the current result is 100% of Gladys' ranking is composed of inputs (PH and DE) that have little to do with the functionality of the app. Other solutions:
Forcing app reviewers to test with hardware. I think this breaks TryMyUI's model. For NIL it might be possible.
Breaking App Mining into categories. Seems like this could work, but making all apps compete in specialized categories to accommodate this one outlier seems like a poor cost/benefit trade off.

Additional context Glady's is awesome. Not trying to penalize Gladys. Apologies. Just trying to surface this challenge and start a discussion about it.

cuevasm commented 5 years ago

Agreed, I would propose something like either a max number of App Reviewers you can't be eligible for, for now that should be 1. If you qualify for the others, you can still enter. Or perhaps better long term, a minimum percentage of the reviewers you have to qualify for, something like 90%. The problem here is you might have teams optimizing for which score not to qualify for, but maybe there's a point at which there are enough, and broad enough reviewers, that we are throwing out worst or zero scores anyway? There are problems with that too, but I'm trying to work toward allowing awesome projects like Gladys to be able to stay in - I believe it's important that the ecosystem have a diversity of dapps. We also can't ignore hardware.

Pierre-Gilles commented 5 years ago

Hey! I'm the founder of Gladys.

It simply seems unfair and implicitly incentivizes apps that cannot be tested.

I agree with you on that. That's not easy to rank on the same scale apps that have completely different properties (Browser, mobile, hardware), and right now the solution can seems unfair for other app developer.

Forcing app reviewers to test with hardware. I think this breaks TryMyUI's model. For NIL it might be possible.

I suggested in another GitHub Issues to host a static version of Gladys UI online so that TryMyUI reviewers could rank it. But again this is not exactly fair, at it will be something static and not the real Gladys product...

alvesjtiago commented 5 years ago

I have no intention of penalising Gladys and nothing against the app, I'm sorry if what I'm about to comment may sound negative @Pierre-Gilles but it's nothing against you or Gladys.

The way the average was calculated on this App Mining round for the hardware apps was in my opinion very demotivating for the other apps. Since the Digital Rights score, for example, can only hurt one's score (on this round it was at most 0,57) it doesn't make sense to have a normal average completely disregarding this. If the average was to be calculated with the top value for each reviewer that couldn't review it, despite still not being fair to the other apps, it would at least have been more realistic.

My suggestion for apps that can't be tested by a certain reviewer is to calculate the score with a weighted average that takes into account a certain standard deviation from the scores that could be calculated. Happy to elaborate more on this but it was very disappointing to see the tremendous impact that leaving out reviewers had on the ranking.

Pierre-Gilles commented 5 years ago

@alvesjtiago No worries :) I completely understand that the actual solution is frustrating for non-hardware app. Let's find a solution.

For the Digital Right score, I think we can definitely evaluate Gladys. It's going to be a different process that pure web apps, but it's completely doable.
For TryMyUI, either we find a way to evaluate non web app (with a demo version hosted online for example), or we just use a weighted average like you said. The big downside of the weighted average is that apps which can't be evaluated by a reviewer will be impacted in a negative way as they'll perform averagely and without having any ways to improve. It'll create the opposite effect: making less attractive to build a certain types of apps, and I'm not sure it's good.

I think the biggest problem now is that there are so few reviewers for now that removing just 2 of them make it instantly unfair. When the app mining challenge will have more reviewers, it'll have less impact.

Two more long term solutions:

There was a proposal that apps should be evaluated by at least > 80/90% of reviewers.
Categories could solve the issue (but for now it would be unfair to have a hardware category because Gladys is the only hardware app)

friedger commented 5 years ago

In addition to the percentage I would also add a flag to reviewers indicating whether they are essential or not. If an essential reviewer can't review the app then the app is not eligible.

I would see New Internet Lab reviewer as essential.

stackatron commented 5 years ago

PBC discussed and decided the only fair decision was if each app is reviewed by each reviewer.

Pierre-Gilles commented 5 years ago

@jeffdomke Seems fair! Can we get in touch to see how we get make Gladys evaluated by TryMyUI and Digital Rights? I don't think it's a big issue.

@GinaAbrams Are you available to discuss that on a call?

GinaAbrams commented 5 years ago

Hey @Pierre-Gilles apologies for the delay here, yes happy to, will follow up via email! 👍

Pierre-Gilles commented 5 years ago

@larrysalibra I just had Gina on the phone, and she said I should get in touch with you to see how Digital Rights reviewer could review Gladys.

To me, you don't necessarily need a Raspberry Pi to test Gladys and its Blockstack integration.

We have a Docker image that you can pull, so you can simply execute Gladys on any machine for this review process.
Even better, as I'm currently working on the next big release of Gladys, I deployed a hosted demo version, which is deployed on Netlify automatically at each push on master. It's a demo so real world controls are not enabled, but as Blockstack code only happens in the browser, Blockstack integration works fully in this version!

You'll find this hosted version there:

https://demo.gladysassistant.com/signup

Let me know what you think of that, if you want to jump on a call so I can show you quickly all that, just email me :)

friedger commented 5 years ago

@Pierre-Gilles dockerization looks nice. Do you have a roadmap when this will become available to the public (on their PIs)?

friedger commented 5 years ago

@jeffdomke With https://github.com/blockstack/app-mining/issues/91#issuecomment-487995449 this can be closed.

And should be added in the changelog.

Pierre-Gilles commented 5 years ago

@friedger Developer preview is coming soon :)

GinaAbrams commented 5 years ago

Thanks @friedger, its been added to the changelog and I am closing now.

larrysalibra commented 5 years ago

@Pierre-Gilles Sorry about the delay in reply. We can't change the rules during the review period.

There are a couple of issues here:

1) We review apps from the URL that was submitted at start of app mining. In the past I've received requests from app developers to review new versions of apps or other pages of apps not deployed at the URL they submitted and declined their requests.

Because of that, we're unable to review the docker or demo/preview versions that you shared in this cycle.

2.) With regards to reviewing a demo version or a docker version: What we're trying to do is give users some sort of certainty that the apps they're using have specific characteristics: that they let the user sign in with Blockstack ID and store data in their gaia hubs. It's important for us to review these apps in the same way and environment that the user would use them in real life. Evaluating demo versions of apps doesn't seem fair to other developers that have had their apps reviewed in the same environment in which users actually use it. Reviewing an app in an entirely different environment/form factor and hardware than users are intended to use it doesn't seem incredibly useful to me.

I'm open to discussion around 2. if you want to open another github issue and start a discussion.

Pierre-Gilles commented 5 years ago

@larrysalibra I replied here => https://github.com/blockstack/app-mining/issues/107

stacks-archive / app-mining

All apps reviewed by all reviewers #91