zooniverse-glacier / notesFromNature

https://www.notesfromnature.org/
Apache License 2.0
13 stars 11 forks source link

Transcription absolute numbers completed and remaining #292

Closed robgur closed 11 years ago

robgur commented 11 years ago

One comment on our recent blog post was that it would be nice to see how many transcriptions are left in each collection, not just the grand total transcribed or a percentage. Could we list on the Calbug and SERNEC main collections page a line just below "percent complete" that lists # of records transcribed/# of total records. For example, something like: "24,245 of 79,240 Calbug records complete"

I think this would help people know the scope of the missions as well as just the raw percents. This strikes me as a nice way for people to know the "finish line" much better than they do now.
@arfon @chrissnyder @parrish

chrissnyder commented 11 years ago

A good idea. Added in 68360a1408efbbe34d08fb305f6ed247d0f2808a

It is important to understand where the numbers come from. The total is simple enough, just the number of records in the system. The "completed" number is computed from the classification count, assuming 10 transcriptions per record. So it doesn't necessarily follow that if that number says 1,000 records complete, that there is, in fact, 1,000 records complete, because those classifications are likely spread out of over many more records.

edit: also, I made the numbers pretty subtle. Open to opinions to change that.

robgur commented 11 years ago

Chris, thanks! This short email explains a mystery to me since the beginning of the project. We expected that there would be 3 transcriptions per specimen, not 10. The number of transcriptions and number of records now makes MUCH more sense to me! Whew. I think we can just tally the number of presumed transcriptions we need (which I do believe is 3 per specimen not 10) and the number complete and use that as our metric. Sound good to everyone?

On Mon, Jun 3, 2013 at 12:16 PM, Chris Snyder notifications@github.comwrote:

A good idea. Added in 68360a1https://github.com/zooniverse/notesFromNature/commit/68360a1408efbbe34d08fb305f6ed247d0f2808a

It is important to understand where the numbers come from. The total is simple enough, just the number of records in the system. The "completed" number is computed from the classification count, assuming 10 transcriptions per record. So it doesn't necessarily follow that if that number says 1,000 records complete, that there is, in fact, 1,000 records complete, because those classifications are likely spread out of over many more records.

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/292#issuecomment-18860141 .

chrissnyder commented 11 years ago

This actually becomes slightly trickier to calculate now that the number of transcriptions has (or will likely) been modified. I'll probably change this calculation to just be number of complete records / number of total records.

The reason I didn't do that before is because that very much is a lagging indicator for the "completeness" of a project. A project can have quite a few transcriptions before records start to be completed. A side effect of reducing the number of needed transcriptions is that this number isn't as "off" as before.

robgur commented 11 years ago

Yeah I had wondered about the lag effects! I agree decreasing the # of transcriptions per image helps. Thanks much for thinking about this (cuz I have too).

On Mon, Jun 10, 2013 at 9:02 AM, Chris Snyder notifications@github.comwrote:

This actually becomes slightly trickier to calculate now that the number of transcriptions has (or will likely) been modified. I'll probably change this calculation to just be number of complete records / number of total records.

The reason I didn't do that before is because that very much is a lagging indicator for the "completeness" of a project. A project can have quite a few transcriptions before records start to be completed. A side effect of reducing the number of needed transcriptions is that this number isn't as "off" as before.

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/292#issuecomment-19204199 .

chrissnyder commented 11 years ago

At the moment, using raw complete / total would mean the following:

SERNEC at ~27% (because we "completed" the duplicate images) Calbug at ~3.2%

edit: I think I'm going to let this wait awhile. The completes will start to rise quickly now that it only takes 4 transcriptions. Soon as it gets up to around where it is now, I'll make the change.

robgur commented 11 years ago

Fine by me. We will need to again explain the numbers and why they have changed but I think we are now at least settled on this for the foreseeable future. -r

On Tue, Jun 11, 2013 at 10:50 AM, Chris Snyder notifications@github.comwrote:

At the moment, using raw complete / total would mean the following:

SERNEC at ~27% (because we "completed" the duplicate images) Calbug at ~3.2%

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/notesFromNature/issues/292#issuecomment-19274610 .