zooniverse / wildcam-gorongosa-education

This is WildCam Labs, the education/exploration-oriented extension of the WildCam Gorongosa project.
https://lab.wildcamgorongosa.org/
Apache License 2.0
0 stars 1 forks source link

CSV species count has non-numbers: "11-50" and "50+" #303

Closed shaunanoordin closed 7 years ago

shaunanoordin commented 7 years ago

Overview

When a Teacher/Student downloads a CSV (from either the Map Explorer or the Assignments page), the column for "number" (i.e. aggregated species count) can have the numeric range/text "11-50" and "50+" in addition to the numbers 1 to 10.

For some users, this is an issue because they expect specific integers in their CSV data, not a numeric range or text.

Notes

This issue was originally reported by Bridget in an e-mail to me on 29 Sep 2016. Of note, the symptoms she described was:

Sorry to bother you again, but another teacher just brought another issue to my attention. In the csv download (see the attached file I downloaded today), if you look at the species_count column and scroll down to row 148, you’ll see that the species count for that image shows up as “Nov-50” instead of a number. This same thing appears in several other rows further down. If you convert the entire column to a “number” format, the date changes from “Nov-50” to 17106.

In this scenario, the app she used (presumably Excel) to open the CSV misinterpreted the numeric range "11-50" as the date November the 50th. This illustrates one example of how the mix of numbers and numeric ranges/text in a single column could make things difficult for teachers and users.

Possible Solution

The most straightforward solution would be one of the following:

However, we need a subject expert to weigh in on this, as there are two conflicitng goals here: between usability (Teachers/Students obviously cannot use an ambiguous range like "11-50" and "50+" in standard numerical calculations and charts) and scientific accuracy (the range 11-50 isn't the same as the mean/estimate of 25).

(As I understand it, the WildCam Classification/Survey task was designed this way for a reason, as excessively large discrete counts (i.e. actually putting in ninety integers from 11 to 100) would not have contributed to the research or survey efforts.)

@aliburchard do you have any thoughts?