Closed sdsantos closed 4 years ago
Thanks for the proposal. I've been thinking about this and keep coming back to one main issue here. A lagging, 3 months long TryMyUI score could end up giving more weight to an older, worse version of the app. The idea behind 10 user testing videos every other month is that it provides a month to improve the app, and app makers should be able to see that reflected in their scores almost immediately.
Now it will take up to 2 months to get a change reflected in the TryMyUI score, so an earlier improvement I think would be welcomed.
And if 3 months is too much for a full new score, what about 5 tests every month? If the original issue was the amount of tests, and not the amount of apps being tested, this would result in an improvement with no downside.
Spoke to TryMyUI yesterday and looks like we can make this shift. The proposal is to do 10 tests for new apps enrolled (new = does not have any TryMyUI data) and 5 tests with a weighted rolled score for apps already enrolled. Can folks thumbs up to make this change?
@GinaAbrams can you please provide an example on how it will calc the result?
Thanks for the feedback on the call yesterday. Sounds like we should explore this more. Really happy TryMyUI is a valued reviewer.
Updated with new methodology in App Mining Changelog and via email to app miners. Closing.
What is the problem you are seeing? Please describe.
Since #153 that TryMyUI reviews are only done every two months.
How is this problem misaligned with goals of app mining?
This does not incentivise the continuous improvement of the apps. An improvement might only be evaluated two months after.
What is the explicit recommendation you’re looking to propose?
Instead of doing 10 tests every two months, I suggest:
The score would be calculated based on the last 10 tests of the app.
Describe your long term considerations in proposing this change. Please include the ways you can predict this recommendation could go wrong and possible ways mitigate.
With this proposal, less tests would be run overall, which helps with its scaling.
Test scores will take a bit longer to get fully refreshed, so if there was a significant bad version getting tested, it might take up to 3-4 months to get it fully out of the score sheet. But they would also start to be improved sooner.