Open talmo opened 1 year ago
Hi @talmo , thanks for tracking this issue.
In our use case, we have a bit more complex problem as the 2 .slp files contain different types of data (Perhaps this is why sleap is no de-duplicating properly).
1. slp file 1 comes from the human annotation over 10 vids and contains 1000 human labeled frames out of 300,000. We'd like to ideally keep just the 1000 human labeled frames and make an "subsampled human .slp" file (to be merged below).
DONE (to post code)
2. slp file 2 comes from predictions on several videos. The colab you generated for us a few days ago does work by looping over these .slp files extracting only frames we want (from a previously computed list) and then making a "subsampled predicted .slp" file which contains only the predictions we want.
DONE (Talmo's code)
I hope this makes sense, let us know otherwise. Thanks so much catubc
Currently it looks like we're not appropriately handling duplicate tracks when merging.
From the GUI, this crops up when we merge labels.
From the API, the entrypoint is when you do
sleap.load_file(..., match_to=base_labels)
.This should be handled downstream here somewhere: https://github.com/talmolab/sleap/blob/ebd2e1ec8f062efaf2588884bb46e92a8030ddcd/sleap/io/format/labels_json.py#L401-L402
The only tricky part is that we may have tracks with the same name that should actually be different tracks.
For identity/apperance-based models, we use the track name to identify that it's the same animal, which is the use case described in #1080.
At a minimum, one fix would be to discard any new empty tracks after a merge operation.
Merging tracks with the same name might be tricky though, so we might want to think about edge cases:
track_0
in one video and a different track namedtrack_0
in a separate video probably shouldn't be mergedmale_adult
in one video and a a different track namedmale_adult
should probably be merged, though technically it won't matter downstream for ID models if they're different objects since we just use the name to match themtrack_0
might refer to a different animal in each run, so we might want to do instance-level matching to resolve the differences?I think the bigger version of this fix would involve adding a new attribute to
Track
s that specifies whether it's a "class" or unique track.In the meantime, maybe we can add a flag to
sleap.load_file
(+ aLabels.merge_tracks(by_name: bool = True)
instance method) and a GUI option that allows the user to specify whether to merge tracks by name. This could be done post-merge via a menu item in the Tracks menu, but we could also add a convenience checkbox in the merge resolution window.For reference, I wrote a Colab that does the merging by track name (but also some other reindexing): https://colab.research.google.com/drive/13DAiPiLq4_8suZOlPoD67ReIJzjN8oCW?usp=sharing
The core logic for merging by name is something like:
Discussed in https://github.com/talmolab/sleap/discussions/1080