Open sync-by-unito[bot] opened 3 years ago
➤ Shannon Dubay commented:
Data Science ( https://pantheradatascience.atlassian.net/jira/people/team/61a6dbc3-8e98-4200-ab42-bd46246ce00b?ref=jira$&src=issue ) (Shannon Dubay Christopher Marais Lauren Foden Michael Ross Ross Tyzack Pitman Valentine TawiraIsaiah Lekay ) From what we spoke about in today’s “training”. Please feel free to add any additional information that I might have missed.
We spoke about the need to better manage reference images within IDS. Currently tagging reference images is almost an after thought, with no checks for accuracy/completeness. Reference images are extremely important because when we use Reco to ID new images, we rely only on reference images to represent all known individuals in the naming process. Each individual should have 1 reference image per side (across all years). If a known individual is missing a reference image, it will not exist in the next round of IDing.
This is phase one of three.
1. We will re-structure the current Reference Database module to better suit our needs. This module could even be renamed to “Reference Image Management” to better describe its purpose. The user will still specify their site and the species, as they do now. Note: now, the user must specify each database separately, but it might make more sense for the user to specify just the site, and IDS automatically pulls databases from all years of that site. The current table will be removed. Instead, there will be a table with each row representing an individual from that site. The individuals' name will be listed in the first column, then will be a “Left” column and a “Right” column (keeping in mind that these aren’t the only orientations- so perhaps the columns should be determined by the orientations supplied in the databases loaded). The Orientation Columns will inform the user if each individual has at least 1 reference image tagged per Orientation, taking into consideration that an individual may not have images for all orientations. So, the logic for each orientation needs to be something like (taking Right side as an example): “Does this individual have any Right side images across any of the databases (any years)? If no, return “NA”. If yes, then does this individual have at least 1 of those images tagged as a reference image? If yes, return “Yes”, else return “No”.” In this way, users will be able to easily tell per individual if 1) they just don’t have any images for that orientation, or 2) they have images for that orientation and one is tagged, or 3) they have images for that orientation but none are tagged. (the scary scenario!!) We could further get the point across by highlighting any rows that have a “No” in any of the orientation columns in red, indicating an issue. Or, these individuals could be listed at the top of the page with a big warning. We spoke about perhaps reporting a percentage to the user at the top of the page- percentage of individuals that are r”eco-ready” (for each individual, if there are images for an orientation, at least one is tagged as a reference image). Personally, I think a ratio or fraction would be more useful than a percentage, so they can see how many individuals they have, and how many are ready or not.
2. Checks will also be added to the Pattern-Match tab of Reco, as one of the first steps that happens after the user presses to “Run” in that tab. The same logic behind the new Reference Database re-vamp can be applied as a check in the Pattern-Match tab of Reco. Reco would check if all individuals are “reco-ready” (for each individual, if there are images for an orientation, at least one is tagged as a reference image). Any issues should be reported to the user through an informative popup (maybe even specify the individual and orientation that needs attention, if possible), and telling them to use the Reference Image Management Module and Manual Identification Module to check and edit (tag/untag) any reference images, respectively.
Last: Currently we already have checks for ensuring that all reference images are supplied for reco, but we don’t have a check for the opposite situation: extra/unknown images/files. Leftover reference images may be included accidentally (i.e., ones that used to be reference images but have since been replaced), which creates extra unintended work. Most importantly, this check will eliminate situations whereby images are accidentally copied/pasted into the folder twice (causing the duplicated file to have a name change that includes a space, which breaks Reco in the Validate tab). Informative popup errors should include the file names of the unexpected files, and tell the user to remove them specifically.^ the above has been created in its own task https://pantheradatascience.atlassian.net/browse/DV-265 , given the high priority of it.For background, if needed (optional):
GOAL : We want one good-quality reference image per side per individual. All individuals should have at least 1 reference image; no individual should have more than 2 reference images. We need a good way to:
┆Issue is synchronized with this Jira Task by Unito ┆Attachments: Screenshot 2021-08-05 at 21.10.09.png