Closed JoyceGross closed 9 years ago
Joyce, when I processed the Calbug record dump back in December/Jan., my code did full counts of number of subjects transcribed, and I always found between 4 and 10, never more. However, it is my understanding that the roulette wheel on subjects means that the same transcriber might get the same record more than once. This is likely particularly true as one comes towards the end of a batch of images. I don't think the code is "smart" enough (yet) to check if a transcriber_id has already touched a subject_id, and its hard because not all our transcribers "log in".
Consider this another "next step" we need to take related to code developmet, unless Steve or Chris tell me I am wrong about this...
On Mon, Apr 14, 2014 at 6:46 PM, JoyceGross notifications@github.comwrote:
Users are reporting getting the same CalBug images more than once.
This seems to be true according to the data dump. In a few cases there are several hundred records for ONE user for ONE exact same image!
1) Why are users getting the same image multiple times?
2) What happened to the 4 (or previously 10) transcription limit per image with some of these repeat situations?
3) Are all these duplicate records counting towards a user's total?
Here's a list of images sent to the same user more than once:
http://calbug.berkeley.edu/data/NfN/repeats.html
Here are a few specific examples of the raw data for the images that have been sent out more than once to the same user (sorry it's just the raw data, not nicely formatted, but you can still read it and I find examples are useful).
http://calbug.berkeley.edu/data/NfN/nfn4.txt http://calbug.berkeley.edu/data/NfN/nfn10.txt http://calbug.berkeley.edu/data/NfN/nfn13.txt http://calbug.berkeley.edu/data/NfN/nfn14.txt http://calbug.berkeley.edu/data/NfN/nfn636.txt
As far as I can tell, this has been happening since the start of the project a year ago.
Joyce
— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334 .
Are they also getting skipped records more than once?
Rob -- see issue #324. Chris wrote: "Correct, a 'skip' counts as a transcription. I did this to ensure if a user skips an item, they do not get the same one again later."
Yes skips are different from the random-image-to-transcriber process, I think. Worth getting a confirm from @chrissnyder or @sraden.
On Tue, Apr 15, 2014 at 9:57 AM, JoyceGross notifications@github.comwrote:
Rob -- see issue #324https://github.com/zooniverse/notesFromNature/issues/324. Chris wrote: "Correct, a 'skip' counts as a transcription. I did this to ensure if a user skips an item, they do not get the same one again later."
— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334#issuecomment-40499460 .
Skips are treated no differently than any other transcription.
So a user could get the same image more than once and skip it both times (or all times)?
On Tue, Apr 15, 2014 at 12:05 PM, Chris Snyder notifications@github.comwrote:
Skips are treated no differently than any other transcription.
— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334#issuecomment-40514148 .
Theoretically, they should never get the same image more than once, regardless if they skipped it or not. As long as they submit some sort of answer, skip or otherwise, that counts as them having seen that label and will not be shown it again.
But in practice they are getting the same image more than once.
Judging from the timestamps, it looks to be more like the site is sending back multiple classifications from the same user, rather than a user getting an image more than once. A lot of those timestamps are within a second or three of each other.
Several people have also complained about getting repeat images for both SERNEC and Calbug.
But not sure how widespread it is or how often it happens for individuals.
Users insist that this is still an ongoing problem.
An object that a user just today reported getting a second time: http://talk.notesfromnature.org/#/subjects/ANN0003kao
An old talk thread discussing the phenomenon: http://talk.notesfromnature.org/#/boards/BNN0000002/discussions/DNN000023e?page=5&comment_id=544a9d3357123016430001a2
Can we help with this?
@brian-c tells me that @chrissnyder did an update 20 days ago that should have fixed this... Did it get deployed? @chrissnyder, any ideas?
Thank you!!
That update should have fixed it. I'll double-check this because Notes swaps subject groups depending upon what collection you are looking at, which might introduce dupes somewhere.
This is going to be more difficult to track down. I've tried doing a number of classifications, and I can't spot any obvious instances of duplicates appearing in the subject queue.
Keeping the issue open for new reports or sequences of actions that create dupes.
Users are reporting getting the same CalBug images more than once.
This seems to be true according to the data dump. In a few cases there are several hundred records for ONE user for ONE exact same image!
1) Why are users getting the same image multiple times?
2) What happened to the 4 (or previously 10) transcription limit per image with some of these repeat situations?
3) Are all these duplicate records counting towards a user's total?
Here's a list of images sent to the same user more than once:
http://calbug.berkeley.edu/data/NfN/repeats.html
Here are a few specific examples of the raw data for the images that have been sent out more than once to the same user (sorry it's just the raw data, not nicely formatted, but you can still read it and I find examples are useful).
http://calbug.berkeley.edu/data/NfN/nfn4.txt http://calbug.berkeley.edu/data/NfN/nfn10.txt http://calbug.berkeley.edu/data/NfN/nfn13.txt http://calbug.berkeley.edu/data/NfN/nfn14.txt http://calbug.berkeley.edu/data/NfN/nfn636.txt
As far as I can tell, this has been happening since the start of the project a year ago.
Joyce