Digital Rights Reviewer: Incorrect results for March Mockup

ghost commented 5 years ago

What is the problem you are seeing? Please describe. The numbers and calculations used in the "March 2019 Results w/ Digital Rights Reviewer" do not line up with what they are in "March 2019".

For instance: "TMUI Average" March : =if(P2="Ineligible", "Ineligible", average(P2:S2)) March w/ DRR: =if(P2="skipped", "skipped", (P2-average(P:P))/stdev(P:P))

"PH Upvotes | PH Credible Upvotes" (whatever makes a "Credible Upvote") Sigle March: 134 | 104 SIgle March w/ DRR: 0 | 0

I've attempted to fix this in my spreadsheet but upon doing so, unless there was a copy and paste error, things still don't look right in the fixed version: https://docs.google.com/spreadsheets/d/11X_REUogwP_mWOX4L9qSWXLJ7dNYTMXGM_GfMxAN4Wg/edit?usp=sharing

For instance, Air Text ends up beating Encrypt My Photos despite them having equal DR scores. I do not intend to read the 17 page algorithm paper, I'd imagine someone in charge here can realize that things aren't adding up, and to at least fix the wrong calculations and numbers.

How is this problem misaligned with goals of app mining? The insights into the March Mockup really do help show transparency in future reviews that affect all of us, but these numbers do not add up, thus this transparency is misinforming us.

What is the explicit recommendation you’re looking to propose? Fix the numbers/calculations and alert the community of the updated results if DR was made a reviewer for the month of March.

Describe your long term considerations in proposing this change. Please include the ways you can predict this recommendation could go wrong and possible ways mitigate. Triple check numbers and/or allow the community to audit the calculations BEFORE making the final results. Sending transaction rewards before mistakes can be found will have adverse affects if/when errors are made in the monthly final scores.

Edit: Looks like https://github.com/blockstack/app-mining/issues/50 should cover the long term consideration.

hstove commented 5 years ago

Hey @AnthonyRonning , thanks for checking this. I've fixed the issues you caught - would you mind giving the dry run results another look?

https://docs.google.com/spreadsheets/d/1_lR82QaRTzy58p5fItlBPPJ8yJzaZJuRu2nV9EG6hYs/edit#gid=1959028393

ghost commented 5 years ago

Hi @hstove

TMUI Average still shows as =if(P2="Ineligible", "Ineligible", (P2-average(P:P))/stdev(P:P)) for March w/ DRR.

PH Average still shows a different number. For graphite it's 2.833812134 vs 2.854970259.

hstove commented 5 years ago

Thanks, there are some small nuances causing that discrepancy - fixed both of those.

AirText still beats EncryptMyPhotos, even though it has a lower average this month, because the score last month was higher.

ghost commented 5 years ago

Thank you @hstove, the numbers look correct to me. And thanks for explaining the previous score thing, I didn't think that had any contribution to the new score.

stacks-archive / app-mining

Digital Rights Reviewer: Incorrect results for March Mockup #62