Closed drphilmarshall closed 11 years ago
@drphilmarshall We're not set up to control the dud frequency.
Ah - they work differently from the lenses? No matter: Aprajita thinks the duds are actually more important. Can you reduce the number of "Nice catch!" messages for duds, without decreasing their frequency? At least that way there would be less supervision.
Thanks!
On Thu, Apr 4, 2013 at 9:44 PM, Amit Kapadia notifications@github.comwrote:
@drphilmarshall https://github.com/drphilmarshall We're not set up to control the dud frequency.
— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/107#issuecomment-15922614 .
Morning! The feedback so far doesn't really contain many hints about the lens frequency, but what I can see seems relatively positive. No-one complaining about too many sims (except a few experienced hunters on the LensHunters list). If we change this halfway through the beta test, and then send a prompting email to tray and get the remaining 100 or so testers onto the site as fresh newcomers, this could be an interesting AB test on the lens frequency. @kapadia can we chat about this and a couple of other things, today? Thanks!
Quick update on this: @kapadia and I are paircoding over skype to enable the Science Team stream design as "Beta-B" in classifier.coffee. The first half of the beta test (up until now) will henceforth be known as "Beta-A" :-)
Really interested to see how this plays out - I think the two frequency schemes bracket the acceptable range, actually: Amit's was "easy" (loads of sims), ours is "hard" (rapid fall off to 1 sim in 20 images after 40 images). We are coding in the concepts of a "Level" and a "Difficulty" - Amit's scheme is Difficulty = 2, ours is Difficulty 5 - the initial sim frequency is 1/Difficulty.
@ttfnrob, I'll email you when the switch is made: if you could send a prompting email to the beta testers that would be great. Ill send you a draft!
OK, code is pushed. Both Amit and I noticed an interesting effect: with the 1/5,1/10,1/20 stream that we tried, we both found that we were feeling lonely after 50 images! It seemed like the site wasn't talking to us any more!
So, we bumped up the lens frequency to a B scheme of 1/5,1/7,1/10,1/14,1/20 - so you only get left really alone if you reach 100 images. For comparison, the A scheme was 1/3,1/4,1/5,1/6,1/7 - a noticeable difference.
This is an educated gamble. With this stream we will have plenty of training data to check purity and infer the crowd's learning rate and expertise, but they will need to see more images to get through the dataset. We are OK with taking that risk though, because we saw the beta testers spending significant time on the site - and it feels to me like it'll be easy for people to get hooked and do 40 or more images.
That addictiveness was achieved through clever design: kudos to the dev team for that :-)
I emailed a nice draft to send to the testers explaining the AB test, and inviting them to join in - he's going to send it out soon, and then we'll see what happens! :-)
Thanks Phil & Amit!!
Sorry if I missed this elsewhere but is the simulated frequency counter being reinstated too so that users know what to expect?
Aprajita.
On 5 Apr 2013, at 19:55, Phil Marshall wrote:
OK, code is pushed. Both Amit and I noticed an interesting effect: with the 1/5,1/10,1/20 stream that we tried, we both found that we were feeling lonely after 50 images! It seemed like the site wasn't talking to us any more!
So, we bumped up the lens frequency to a B scheme of 1/5,1/7,1/10,1/14,1/20 - so you only get left really alone if you reach 100 images. For comparison, the A scheme was 1/3,1/4,1/5,1/6,1/7 - a noticeable difference.
This is an educated gamble. With this stream we will have plenty of training data to check purity and infer the crowd's learning rate and expertise, but they will need to see more images to get through the dataset. We are OK with taking that risk though, because we saw the beta testers spending significant time on the site - and it feels to me like it'll be easy for people to get hooked and do 40 or more images.
That addictiveness was achieved through clever design: kudos to the dev team for that :-)
I emailed a nice draft to send to the testers explaining the AB test, and inviting them to join in - he's going to send it out soon, and then we'll see what happens! :-)
— Reply to this email directly or view it on GitHub.
Ahh, you are too sharp Aprajita :-) I tried to write about this on the other issue, but I can't find the old closed issues any more. Amit and I are trying to track them down but its a mystery - some bug to do with milestones, we think.
Anyway, here will do. The other thing I noticed was a sense that when I knew that I was on level 5 of the new stream, there was only 1 in 20 images with a lens in it, so I clicked through them super fast. And I missed two... So this got me thinking, maybe the lens frequency status will discourage careful searching! Its back to the key scientific issue, completeness vs purity. So I thought, OK, one thing at a time: lets do the AB test on the stream frequency first. What do you think - a sensible strategy?
I don't know how long people will play with the site - but in principle they could come back to it several times over the next few days. So, we could consider turning on the status message on Monday, so that both group A and group B users see it - and then we could do at least some sort of test of its impact on purity. Since the status message was originally there to try and prevent over-clicking, we could also just measure the purity from the beta clicks, and then see if we need to take action. Thoughts?
Dr. Phil Marshall
Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm
On Fri, Apr 5, 2013 at 9:43 PM, aprajita notifications@github.com wrote:
Thanks Phil & Amit!!
Sorry if I missed this elsewhere but is the simulated frequency counter being reinstated too so that users know what to expect?
Aprajita.
On 5 Apr 2013, at 19:55, Phil Marshall wrote:
OK, code is pushed. Both Amit and I noticed an interesting effect: with the 1/5,1/10,1/20 stream that we tried, we both found that we were feeling lonely after 50 images! It seemed like the site wasn't talking to us any more!
So, we bumped up the lens frequency to a B scheme of 1/5,1/7,1/10,1/14,1/20 - so you only get left really alone if you reach 100 images. For comparison, the A scheme was 1/3,1/4,1/5,1/6,1/7 - a noticeable difference.
This is an educated gamble. With this stream we will have plenty of training data to check purity and infer the crowd's learning rate and expertise, but they will need to see more images to get through the dataset. We are OK with taking that risk though, because we saw the beta testers spending significant time on the site - and it feels to me like it'll be easy for people to get hooked and do 40 or more images.
That addictiveness was achieved through clever design: kudos to the dev team for that :-)
I emailed a nice draft to send to the testers explaining the AB test, and inviting them to join in - he's going to send it out soon, and then we'll see what happens! :-)
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/107#issuecomment-15979628 .
Now we have the spotters guide (and the feedback messages), we can start with a lower lens frequency. Can we have the following, please?
Level 1: lens frequency 1 in 5, dud frequency 1 in 5, length 20 subjects Level 2: lens frequency 1 in 10, dud frequency 1 in 10, length 40 subjects Level 3: lens frequency 1 in 20, dud frequency 1 in 20, length infinity
In this scheme, a user sees 4 lenses in level 1, and 4 lenses in level 2. Duds are subjects where we know there is no lens - we want to check that the users are not seeing lenses where there are none. After 40 subjects, each user will have seen 12 lenses - enough to meet our completeness estimate requirement. Users doing less than 40 subjects will easily meet the sim/subject ratio goal needed for the training/completeness calibration.