pantheracorp / PantheraIDS_Features

A repository for any feature requests related to PantheraIDS
0 stars 0 forks source link

DV-153 ⁃ Better reference image management, phase 1 #213

Open sync-by-unito[bot] opened 3 years ago

sync-by-unito[bot] commented 3 years ago

We spoke about the need to better manage reference images within IDS. Currently tagging reference images is almost an after thought, with no checks for accuracy/completeness. Reference images are extremely important because when we use Reco to ID new images, we rely only on reference images to represent all known individuals in the naming process. Each individual should have 1 reference image per side (across all years). If a known individual is missing a reference image, it will not exist in the next round of IDing.

This is phase one of three.

1. We will re-structure the current Reference Database module to better suit our needs. This module could even be renamed to “Reference Image Management” to better describe its purpose. The user will still specify their site and the species, as they do now. Note: now, the user must specify each database separately, but it might make more sense for the user to specify just the site, and IDS automatically pulls databases from all years of that site. The current table will be removed. Instead, there will be a table with each row representing an individual from that site. The individuals' name will be listed in the first column, then will be a “Left” column and a “Right” column (keeping in mind that these aren’t the only orientations- so perhaps the columns should be determined by the orientations supplied in the databases loaded). The Orientation Columns will inform the user if each individual has at least 1 reference image tagged per Orientation, taking into consideration that an individual may not have images for all orientations. So, the logic for each orientation needs to be something like (taking Right side as an example): “Does this individual have any Right side images across any of the databases (any years)? If no, return “NA”. If yes, then does this individual have at least 1 of those images tagged as a reference image? If yes, return “Yes”, else return “No”.” In this way, users will be able to easily tell per individual if 1) they just don’t have any images for that orientation, or 2) they have images for that orientation and one is tagged, or 3) they have images for that orientation but none are tagged. (the scary scenario!!) We could further get the point across by highlighting any rows that have a “No” in any of the orientation columns in red, indicating an issue. Or, these individuals could be listed at the top of the page with a big warning. We spoke about perhaps reporting a percentage to the user at the top of the page- percentage of individuals that are r”eco-ready” (for each individual, if there are images for an orientation, at least one is tagged as a reference image). Personally, I think a ratio or fraction would be more useful than a percentage, so they can see how many individuals they have, and how many are ready or not.

2. Checks will also be added to the Pattern-Match tab of Reco, as one of the first steps that happens after the user presses to “Run” in that tab. The same logic behind the new Reference Database re-vamp can be applied as a check in the Pattern-Match tab of Reco. Reco would check if all individuals are “reco-ready” (for each individual, if there are images for an orientation, at least one is tagged as a reference image). Any issues should be reported to the user through an informative popup (maybe even specify the individual and orientation that needs attention, if possible), and telling them to use the Reference Image Management Module and Manual Identification Module to check and edit (tag/untag) any reference images, respectively.

Last: Currently we already have checks for ensuring that all reference images are supplied for reco, but we don’t have a check for the opposite situation: extra/unknown images/files. Leftover reference images may be included accidentally (i.e., ones that used to be reference images but have since been replaced), which creates extra unintended work. Most importantly, this check will eliminate situations whereby images are accidentally copied/pasted into the folder twice (causing the duplicated file to have a name change that includes a space, which breaks Reco in the Validate tab). Informative popup errors should include the file names of the unexpected files, and tell the user to remove them specifically. ^ the above has been created in its own task https://pantheradatascience.atlassian.net/browse/DV-265 , given the high priority of it.

For background, if needed (optional):

GOAL : We want one good-quality reference image per side per individual. All individuals should have at least 1 reference image; no individual should have more than 2 reference images. We need a good way to:

  1. check that ﹍current reference images﹍ are tagged (one per side per individual, across all years that the individual was seen), AND that the images are good quality (so that poor-quality images can be replaced with better ones in following years).
    1. include info such as: when that reference image was tagged, who tagged it, what year the image is from
    2. currently, checking the number of reference images tagged can be done through the Reference Database module, but this doesn't show quality of reference images; checking the number and quality of reference images can be done through Manual Identification but is slow and tedious, particularly for sites with high numbers of individuals
    3. check that any existing batch of prepped reference images (used in previous years) are accurate
    4. Reco now checks that all reference images (according to the dtbss) are provided, BUT doesn't give a warning when extra images are included. This could be i.e., an old reference image that needs to be removed because it was replaced, OR a copy/paste error whereby the user pastes the image into the same folder twice, causing the duplicated image to have a file name that ends in " copy" on mac or " (1)" on windows... both files names contain a space and cause issues when validating. This often isn't noticed until a lot of work has been done, and reco needs to be redone
  2. moving forward, ensure that reference images are tagged in a controlled/forced/more accessible way. Currently, it is a complete afterthought to tag reference images in Manual Identification module (often times reference images are only tagged for datasets once the next years data needs to be reco'ed), and completely up to the user to tag and track/maintain the appropriate number and quality of images. Perhaps this needs to be done within Reco itself?
    1. IDS: "hey, you previously only had the right side of Female 4 tagged, but in 2020 the left side showed up; how about you go and tag that left flank before you reco 2021"
    2. IDS: "hey, your previous tag was shite quality but now you might have a better one; best you change your tags honey"
  3. manage the reference image files (.JPG files provided to Reco) so that images don't have to be re-exported and re-cropped every time a new year is reco'ed - while also making sure any existing batch of prepped reference images are accurate

┆Issue is synchronized with this Jira Task by Unito ┆Attachments: Screenshot 2021-08-05 at 21.10.09.png

sync-by-unito[bot] commented 3 years ago

➤ Shannon Dubay commented:

Data Science ( https://pantheradatascience.atlassian.net/jira/people/team/61a6dbc3-8e98-4200-ab42-bd46246ce00b?ref=jira$&src=issue ) (Shannon Dubay Christopher Marais Lauren Foden Michael Ross Ross Tyzack Pitman Valentine TawiraIsaiah Lekay ) From what we spoke about in today’s “training”. Please feel free to add any additional information that I might have missed.