jbrzusto / TO_DO

sensorgnome / motus TODO list for jbrzusto
0 stars 0 forks source link

Ambiguous tags in .motus data #127

Open zcrysler opened 6 years ago

zcrysler commented 6 years ago

I know you dealt with this in the plots you create, but in the detection data, is there a way to mark tags as ambiguous so that users know to look into detections for certain tags. In this case tags for project 64 are overlapping with 57. SELVA tags so it's pretty obvious for James, but it may not be for others down the road.

jbrzusto commented 6 years ago

The XXX_for_tag_project items in the current data API are not returning ambiguous detection, so they're not ending up in tag project .motus files. (In contrast, ambiguous detections are returned by the XXX_for_recvitems, and so **are** in receiver.motus` files). This is because ambiguous detections definitely belong to the receiver owner, but we don't know which tag owner they belong to.

This is a big reason why tag project datasets are incomplete. Below is the number of registered tags by project, where there's at least one detection that could be the tag, but is ambiguous. For each of those tags, there will be some possible detections that have not been downloaded into tag project .motus files, because we don't know for sure which project the detections belong to.

Obviously fixing this is high priority.

+-----------+--------------+
| projectID | numAmbigTags |
+-----------+--------------+
|         1 |            4 |
|         3 |            1 |
|         9 |            3 |
|        10 |            2 |
|        11 |            8 |
|        14 |           34 |
|        15 |           28 |
|        19 |           20 |
|        20 |           20 |
|        28 |            3 |
|        29 |            2 |
|        32 |            6 |
|        37 |            4 |
|        38 |           40 |
|        40 |            8 |
|        42 |            1 |
|        45 |           19 |
|        46 |            1 |
|        47 |           41 |
|        48 |            6 |
|        49 |            2 |
|        50 |           15 |
|        57 |           92 |
|        63 |            2 |
|        64 |           33 |
|        65 |           13 |
|        67 |           12 |
|        74 |            2 |
|        78 |           20 |
|        81 |           19 |
|        82 |           33 |
|        87 |            2 |
|        88 |            1 |
|        90 |           11 |
|        91 |            2 |
|        92 |           37 |
|        95 |            1 |
|        97 |            1 |
|       100 |            5 |
|       103 |           19 |
|       109 |            2 |
|       111 |            1 |
|       124 |            6 |
|       140 |            2 |
|       146 |           33 |
|       153 |            6 |
|       154 |            3 |
+-----------+--------------+
jbrzusto commented 6 years ago

Resolution

To quickly get data for project P, we currently use this scaffolding:

To extend this to ambiguous tag detections, where we don't know the tagDepProjectID, we can generate new, negative projectIDs. Each represents a unique set of up to say 6 real projects (to match the up to 6 possible tags in an ambiguous tag record). An ambiguous project ID is generated for each set of projectIDs for which there's a tag ambiguity connecting those projects. That way, when an ambiguous tag is detected (negative motusTagID), we assign can assign a corresponding ambiguous projectID.

Moreover, the same optimized queries that fetch batches, hits, and runs by projectID will work unchanged for these new, negative projectIDs.

We need a new table to manage ambiguous project IDs

CREATE TABLE projAmbig (
   ambigProjectID SMALL INT UNIQUE PRIMARY KEY NOT NULL, -- unique negative ID
   projectID1 SMALL INT,  -- motus project ID #1
   projectID2 SMALL INT,  -- motus project ID #2; projectID2 > projectID1
   projectID3 SMALL INT,
   projectID4 SMALL INT,  -- ...                  projectID(n+1) > projectID(n)
   projectID5 SMALL INT,
   projectID6 SMALL INT,  -- motus project ID #6  projectID6 > projectID5
   UNIQUE PRIMARY KEY (projectID1, projectID2, projectID3, projectID4, projectID5, projectID6)
   );
CREATE INDEX projAmbig_ambigProjectID on projAmbig(ambigProjectID);

Also, table tagAmbig needs a new column ambigProjectID which will link to table projAmbig. That will allow us to look up tagDepProjectID for ambiguous runs by looking up motusTagID in table tagAmbig.

when a hit or run is assigned a negative tagDepProjectID, that means the hit or run could belong to any of projectID1, ..., projectID6 listed in the record where ambigProjectID = tagDepProjectID (< 0).

We also need a new API entry: ambiguous_projects_for_tag_project (P) that returns the list of negative projectIDs of which P is a member. The client-side code then needs to grab all batches, hits, runs for each of those ambiguous project IDs, merging locally. The only complication with merging is that a batch record might be fetched for two different project IDs, in which case the numHits field should be summed, but otherwise nothing changes.

The rest of the code should work as-is, but of course on the client side, we'll need to do some finagling to mark ambiguous detections, and/or lookup metadata by ambiguous project.

zcrysler commented 6 years ago

So detections showing up as motusTagID "-42" from receiver Lotek 223 for example, are ambiguous tags? They have no info for tagProjID, id, tagType, mfg, tagModel, tagBI, and fullID is "?proj?-0#:0.0"

StuMackenzie commented 6 years ago

FYI. Not sure if this will help or not in the Ambiguous arena, but Margaret Eng's project - # 146 is comprised primarily of ambigous tags - duplicates with RBrown American Woodcock #82 and James Bay - #38. On hourly plots (from Old Cut) most of her detections are labelled as one or the other, but many are not.

zcrysler commented 6 years ago

Some more details on Margaret's project 146: