zooniverse-glacier / notesFromNature

https://www.notesfromnature.org/
Apache License 2.0
13 stars 11 forks source link

Users getting repeat images #334

Closed JoyceGross closed 9 years ago

JoyceGross commented 10 years ago

Users are reporting getting the same CalBug images more than once.

This seems to be true according to the data dump. In a few cases there are several hundred records for ONE user for ONE exact same image!

1) Why are users getting the same image multiple times?

2) What happened to the 4 (or previously 10) transcription limit per image with some of these repeat situations?

3) Are all these duplicate records counting towards a user's total?

Here's a list of images sent to the same user more than once:

http://calbug.berkeley.edu/data/NfN/repeats.html

Here are a few specific examples of the raw data for the images that have been sent out more than once to the same user (sorry it's just the raw data, not nicely formatted, but you can still read it and I find examples are useful).

http://calbug.berkeley.edu/data/NfN/nfn4.txt http://calbug.berkeley.edu/data/NfN/nfn10.txt http://calbug.berkeley.edu/data/NfN/nfn13.txt http://calbug.berkeley.edu/data/NfN/nfn14.txt http://calbug.berkeley.edu/data/NfN/nfn636.txt

As far as I can tell, this has been happening since the start of the project a year ago.

Joyce

robgur commented 10 years ago

Joyce, when I processed the Calbug record dump back in December/Jan., my code did full counts of number of subjects transcribed, and I always found between 4 and 10, never more. However, it is my understanding that the roulette wheel on subjects means that the same transcriber might get the same record more than once. This is likely particularly true as one comes towards the end of a batch of images. I don't think the code is "smart" enough (yet) to check if a transcriber_id has already touched a subject_id, and its hard because not all our transcribers "log in".

Consider this another "next step" we need to take related to code developmet, unless Steve or Chris tell me I am wrong about this...

On Mon, Apr 14, 2014 at 6:46 PM, JoyceGross notifications@github.comwrote:

Users are reporting getting the same CalBug images more than once.

This seems to be true according to the data dump. In a few cases there are several hundred records for ONE user for ONE exact same image!

1) Why are users getting the same image multiple times?

2) What happened to the 4 (or previously 10) transcription limit per image with some of these repeat situations?

3) Are all these duplicate records counting towards a user's total?

Here's a list of images sent to the same user more than once:

http://calbug.berkeley.edu/data/NfN/repeats.html

Here are a few specific examples of the raw data for the images that have been sent out more than once to the same user (sorry it's just the raw data, not nicely formatted, but you can still read it and I find examples are useful).

http://calbug.berkeley.edu/data/NfN/nfn4.txt http://calbug.berkeley.edu/data/NfN/nfn10.txt http://calbug.berkeley.edu/data/NfN/nfn13.txt http://calbug.berkeley.edu/data/NfN/nfn14.txt http://calbug.berkeley.edu/data/NfN/nfn636.txt

As far as I can tell, this has been happening since the start of the project a year ago.

Joyce

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334 .

joanball commented 10 years ago

Are they also getting skipped records more than once?

JoyceGross commented 10 years ago

Rob -- see issue #324. Chris wrote: "Correct, a 'skip' counts as a transcription. I did this to ensure if a user skips an item, they do not get the same one again later."

robgur commented 10 years ago

Yes skips are different from the random-image-to-transcriber process, I think. Worth getting a confirm from @chrissnyder or @sraden.

On Tue, Apr 15, 2014 at 9:57 AM, JoyceGross notifications@github.comwrote:

Rob -- see issue #324https://github.com/zooniverse/notesFromNature/issues/324. Chris wrote: "Correct, a 'skip' counts as a transcription. I did this to ensure if a user skips an item, they do not get the same one again later."

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334#issuecomment-40499460 .

chrissnyder commented 10 years ago

Skips are treated no differently than any other transcription.

robgur commented 10 years ago

So a user could get the same image more than once and skip it both times (or all times)?

On Tue, Apr 15, 2014 at 12:05 PM, Chris Snyder notifications@github.comwrote:

Skips are treated no differently than any other transcription.

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/334#issuecomment-40514148 .

chrissnyder commented 10 years ago

Theoretically, they should never get the same image more than once, regardless if they skipped it or not. As long as they submit some sort of answer, skip or otherwise, that counts as them having seen that label and will not be shown it again.

JoyceGross commented 10 years ago

But in practice they are getting the same image more than once.

chrissnyder commented 10 years ago

Judging from the timestamps, it looks to be more like the site is sending back multiple classifications from the same user, rather than a user getting an image more than once. A lot of those timestamps are within a second or three of each other.

joanball commented 10 years ago

Several people have also complained about getting repeat images for both SERNEC and Calbug.

joanball commented 10 years ago

But not sure how widespread it is or how often it happens for individuals.

DarrenMcRoy commented 9 years ago

Users insist that this is still an ongoing problem.

An object that a user just today reported getting a second time: http://talk.notesfromnature.org/#/subjects/ANN0003kao

An old talk thread discussing the phenomenon: http://talk.notesfromnature.org/#/boards/BNN0000002/discussions/DNN000023e?page=5&comment_id=544a9d3357123016430001a2

Can we help with this?

DarrenMcRoy commented 9 years ago

@brian-c tells me that @chrissnyder did an update 20 days ago that should have fixed this... Did it get deployed? @chrissnyder, any ideas?

Thank you!!

chrissnyder commented 9 years ago

That update should have fixed it. I'll double-check this because Notes swaps subject groups depending upon what collection you are looking at, which might introduce dupes somewhere.

chrissnyder commented 9 years ago

This is going to be more difficult to track down. I've tried doing a number of classifications, and I can't spot any obvious instances of duplicates appearing in the subject queue.

Keeping the issue open for new reports or sequences of actions that create dupes.