mbari-org / vars-gridview

VARS GridView is a tool for reviewing and correcting VARS localizations in bulk.
MIT License
1 stars 0 forks source link

Major: when I change label on one animal in framegrab, it changes label for all animals in that framegrab that have the same label #72

Open kwalz opened 6 months ago

kwalz commented 6 months ago

This is a really hard one to describe. I changed one animal to E similis in this attached screenshot and it changed all the animals to E similis. Let me know if you want to talk this one over. I think it is because the load of these to vars db inserted one line of annotation with multiple box associations so when the concept changes for one it changes for them all. But they are not all mysida. Screenshot 2023-12-20 at 12 37 14 PM

kwalz commented 6 months ago

Screenshot 2023-12-20 at 1 35 53 PM @lonnylundsten @KSchlining This is a major issue. I found this because I was cleaning up mysida and noticed ones that I cleaned up with Giovanna last week when I was showing her how to use new Gridview. She had changed a few of them to E similis during our training but then today they were all mysida again. I changed a few of them to E similis again today and all were changed to E similis (but two are mysida and two are E similis). So gridview is not making a new line of annotation for the new E similis ones, it's simply going back and forth changing the one annotation line to mysida or E similis based on when the last person edited it!

kevinsbarnard commented 6 months ago

I think it is because the load of these to vars db inserted one line of annotation with multiple box associations so when the concept changes for one it changes for them all

@kwalz I think you're right that this is what's happening here; the bounding box does not have a concept label -- the annotation does. When you label any box within a given annotation, all other boxes within that annotation are "relabeled" along with it. This is a limitation of how we're structuring bounding boxes in VARS.

We went back and forth a couple times on this back in 2019/2020 as to whether or not we should force a 1-to-1 relationship between bounding box associations and annotations. As a result, VARS Localize was implemented to allow multiple bounding boxes per annotation, but since then we've been maintaining the 1 bounding box per annotation as a best practice.

That said, there are still 16,570 annotations in VARS that have more than 1 bounding box. Among these, there are 39,739 "excess" bounding box associations (total number beyond the first box). SQL query for reference:

SELECT 
    COUNT(DISTINCT sub.observation_uuid) AS num_multibox_observations, 
    SUM(sub.num_boxes) - COUNT(DISTINCT sub.observation_uuid) AS num_excess_boxes
FROM (
    SELECT
        o.uuid AS observation_uuid, 
        COUNT(a.uuid) AS num_boxes
    FROM
        observations o
    INNER JOIN
        associations a ON a.observation_uuid = o.uuid
    WHERE
        a.link_name = 'bounding box'
    GROUP BY
        o.uuid
    HAVING
        COUNT(a.uuid) > 1
) AS sub

My initial thought here would be to create a new annotation for each excess bounding box, and "move" the bounding box association to that new annotation. So instead of having:

graph BT
    anno_1
    box_1
    box_2
    box_3

    box_1 --> anno_1
    box_2 --> anno_1
    box_3 --> anno_1

we'd create new annotations anno_2 and anno_3 and point the excess bounding box associations to them:

graph BT
    anno_1
    anno_2
    anno_3
    box_1
    box_2
    box_3

    box_1 --> anno_1
    box_2 --> anno_2
    box_3 --> anno_3

I'm curious what folks think about this. This would result in 39,739 new annotations in VARS, so I'd also want to be cautious in case that would negatively impact summary statistics (e.g. total counts), which I believe was the hangup that led us to allow several bounding boxes per annotation originally.

kwalz commented 6 months ago

Let me think on it. Things have been done differently with these rect label and vars localize loads in vars db and I don't want to further confuse the situation by considering each scenario for other types of localizations. It's unfortunate that there are so many mis-IDs that we are struggling with when they batch labeled these in one framegrab...I cannot count how many times I have changed a few localizations on one framegrab within Beroe, pyrosome, and many many other queries the past few weeks and now those have changed everything else in the same frame with the same label, ugh.

hohonuuli commented 6 months ago

My 2 cents.

Yes, there should be a 1-to-1 relation between a bounding box and an observation.

Before we do any bulk creation, we'll need to look at the annotations involved. If they have associations like identity-reference or population* then a bulk creation can mess things up.

kwalz commented 6 months ago

Most of these did have population quantity 999, I think.

kevinsbarnard commented 6 months ago

@hohonuuli @kwalz It seems that a good number of these multi-box annotations do have those sorts of associations. Here's what I'm getting:

from the query:

SELECT
    COUNT(DISTINCT sub.observation_uuid) AS num_multibox_observations_with_either,
    COUNT(DISTINCT CASE WHEN sub.has_identity_reference = 1 THEN sub.observation_uuid END) AS num_multibox_observations_with_identity_reference,
    COUNT(DISTINCT CASE WHEN sub.has_population_association = 1 THEN sub.observation_uuid END) AS num_multibox_observations_with_population
FROM (
    SELECT
        o.uuid AS observation_uuid, 
        COUNT(DISTINCT a_bbox.uuid) AS num_boxes,
        MAX(CASE WHEN a_alt.link_name = 'identity-reference' THEN 1 ELSE 0 END) AS has_identity_reference,
        MAX(CASE WHEN a_alt.link_name LIKE 'population%' THEN 1 ELSE 0 END) AS has_population_association
    FROM
        observations o
    INNER JOIN
        associations a_bbox ON a_bbox.observation_uuid = o.uuid
    INNER JOIN
        associations a_alt ON a_alt.observation_uuid = o.uuid
    WHERE
        a_bbox.link_name = 'bounding box' AND
        (
            a_alt.link_name = 'identity-reference' OR
            a_alt.link_name LIKE 'population%'
        )
    GROUP BY
        o.uuid
    HAVING
        COUNT(DISTINCT a_bbox.uuid) > 1
) AS sub
kwalz commented 6 months ago

@kevinsbarnard Can you send me a csv file with the results from the above sql query?

kwalz commented 6 months ago

There are thousands of annotations with 2 (or 3) boxes and those may be largely the parts localizations because they are Bathochordaeus and a large number of siphonophores where Fathomnet team annotated the parts of those animals (for siphs they did nectosome and the whole animal, Bathos they did inner filter, outer filter, and animal). We definitely don't want 1->1 with those parts ones. This issue may have to be a larger discussion with the whole group in the new year. I agree with Brian that we don't want to do a bulk change on this many without understanding the implications of what we are changing and any unforeseen problems we would create with a bulk change that we can't undo.

kwalz commented 6 months ago

More complications with 1->1, I have found one animal with multiple boxes on it, looks like it was two different imports into vars db of same framegrab: observation uuid: 71277E26-D63C-40CF-9D0A-269727DE72D1. Not sure how persistent this is for the ones with 2 localizations that are not bathos or siphs. See attached screenshot 1. In gridview it has two boxes, one with jedi-knight and one with padawan (screenshot 2). These we would not want 1->1 because there is only 1.

screenshot1 screenshot2

kwalz commented 6 months ago

Another odd scenario, two framegrabs associated with the same observation uuid: 9ECA09DE-7B57-4013-8F44-EC12A9EC24E6, in screenshot 1, but in vars db it is two lines of annotations/framegrabs for Gonatus with a number of bounding boxes on each, screenshot 2. (Kevin, I'm putting these all here for my record next year, no need to investigate these all :)

screen1 screen2