sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
273 stars 137 forks source link

Anchored ROI Request #213

Open Toshea111 opened 1 year ago

Toshea111 commented 1 year ago

Is your feature request related to a problem? Please describe. I am interested in quantifying interactions between several tracked individuals that occur during close contact, with potentially many such interactions occurring at once. This makes it difficult for the behavioural classifier to accurately detect single interactions.

Describe the solution you'd like By including ROIs that move with an individual or body part (thus are 'anchored' to them), this would allow the quantification of such interactions reliably, either through detection of ROI overlaps, or detection of other individuals' body parts.

Describe alternatives you've considered I have tried training a classifier, but the issue is that interactions are variable at any one time.

Additional context I am working with groups of termites tracked in SLEAP, attached is an example video of the tracked data, for reference.

https://user-images.githubusercontent.com/109351104/200376535-4e47a3f7-626e-43df-bd99-081c327ead94.mp4

sronilsson commented 1 year ago

Hi @Toshea111! Makes sense. Just a couple questions.

How do you see the anchored ROIs being defined - is it enough so specify a diameter/width for a circle/rectangle, or do you ever see yourself using polygon anchored ROIs (which is a little trickier to define the size of a single entry box)?

As you say - for each anchored ROI and frame, we can find which other anchored ROIs and which other animal key-points overlaps with it. Is that all the info you need? Those outputs will be in string format, not numerical, and won't immediately fit into any downstream Ml algo. If for ML I guess it would have to be transformed to counts or some sparse table with categoricals.

Toshea111 commented 1 year ago

Hello Simon,

Thank you for the rapid response, to answer your questions:

  1. Circle or rectangle ROIs would suffice, attached is an example superimposed over a frame from the previous video. By specifying the dimensions of a rectangle, and the body part to which it is anchored, you could in theory limit it to head-to-head, or head-to-tail overlaps as needed.
  2. It would also be useful to have a time measure, indicating the number of frames that ROIs remain overlapped, this would then allow discrimination between chance overlaps and actual interactions, by setting a user-defined time threshold for overlaps to qualify as an interaction.
  3. The key outputs would be overlap IDs, a count of these to quantify the number of interactions between each ID pair, and perhaps a measure of overlap event durations and timings. As you mention, most downstream analyses would work with counts and pair overlap frequencies.

I am happy to discuss in more detail if any of the above points need further clarification.

Anchored ROIs

sronilsson commented 1 year ago

Got it - thanks @Toshea111. To find the boundary boxes, the most straightforward might be if used input four key-points (anterior/posterior and the laterals) or two key-points (anterior/posterior) + some user-defines extra metric space to get the boxes. Once done, we could do a few revisions after you're feedback and you've tried it, I can see how this could be useful for others.

The alternatives that would work for you are to extract the black blobs from the white background (e.g. cv2.findContours), or find the animal boundaries through motion (e.g., cv2.calcOpticalFlowPyrLK). You can also find the animals with object segmentation (YOLOv5 is nice). But not sure this would generalize to other setups, but in case this doesn't work out a few things to try.

Toshea111 commented 1 year ago

Much appreciated, either of those initial setups would work well for what I am after, and I would be happy to 'beta test' any additions and provide feedback. I have some limited experience with YOLOv5s, however this system would allow me to feed my data into an established pipeline.

Let me know if you have any additional questions in the meantime.

sronilsson commented 1 year ago

Sounds good, typed something up based on shapely polygons with use-defined buffers thats relative quick and seems to work. Not sure if I am missing something for non-shape shifters like your species where all the body-parts are parallel line like though.. any chance you could share a raw video and some pose-estimation data, just a snippet, don't need a lot.

image

Toshea111 commented 1 year ago

Excellent, attached is a short video and corresponding pose estimation data in '.HDF5' format. I can also the provide the data in '.csv' or '.slp' formats, if preferred.

If you need a longer video, I have another with ~8000 frames.

Termite Test.zip

sronilsson commented 1 year ago

That works cheers. Draft classes for finding, visualizing, and calculating stats for "anchored" ROIs live here. Need to clean, get in GUI, document, and final method for aggregate stats, maybe some time next week. Speed depend on mainly CPU count and number of animals, but hopefully doable..

Toshea111 commented 1 year ago

I look forward to trying it, I will potentially have access to a modelling computer next week, thus I should be able to test it out on both that and my normal machine.

sronilsson commented 1 year ago

@Toshea111 - do you have a csv version of the termites test.h5 as well to share?

Toshea111 commented 1 year ago

Yes, apologies for the delay, see attached.

Termite Test Tracks.csv

Toshea111 commented 1 year ago

Perfect, I really appreciate the rapid turnaround with this.

The user-defined settings will be quite important for termites in particular, as they engage in distinct types of trophallaxis at both ends, so to speak.

I also work with ants and bees, thus I should be able to provide some feedback in terms of performance across species.

sronilsson commented 1 year ago

@Toshea111 - i've updated the pip package of simba, to include the class calls in the GUI - you should see it if you do pip install simba-uw-tf-dev --upgrade. I typed up a first pass doc tutorial here: https://github.com/sgoldenlab/simba/blob/master/docs/anchored_rois.md

Not sure if you have got your slp data imported to SimBA, but that needs to happen for this to work. I am also not sure if I have overlooked anything that prohibits this to scale to large datasets but will stress test. If you see anything useful/necessary missing in the docs, can me know and we can see how we can get it in.

One thing missing is probably visualizations validating the output statistics (e.g., show intersecting bounding boxes / keypoints in some alternative salient colors...) like when another animals roi or body-part is in another animals ROI the regions goes thicker and change line color or something.

https://user-images.githubusercontent.com/34761092/201708169-fa8718f4-3ab0-4942-8950-967cbcc5ba98.mp4

Toshea111 commented 1 year ago

Very impressive, I will conduct a comprehensive trial of the system over the weekend, and let you know if I have any suggestions or feedback.

I can fork and edit the tutorial document as I go, or provide feedback separately here, whichever would be more convenient for you.

sronilsson commented 1 year ago

Thanks! Not much work, the people developing shapley has written most of what is needed. What ever works best for you, I don't mind!

Toshea111 commented 1 year ago

Having had a decent run through the new anchored ROI features, they work very well for my use-case. I thought it would make sense to divide my feedback into separate lists for issues and errors that I encountered, and suggestions for additions.

Issues:

  1. I experience an error when attempting to calculate aggregate boundary statistics with the minimum bout length set to any value above 0. This returns the error message ‘TypeError: unsupported operand type(s) for /: 'str' and 'float'’. If I leave the minimum bout length field empty, the process works as intended.
  2. This is more of a compatibility issue rather than a problem with the anchored ROIs, but SIMBA only appears to work with ‘.slp’ files, rather than the main SLEAP export format of ‘.h5’. The result is that the user has to export files from their ‘model predictions’ folder in SLEAP, rather than creating an export package as intended.
  3. If a video file is a frame or more longer than the tracking data, this leads to a segmentation of the output visualisations. It would be useful to return a warning that the video is longer than the tracking data in such cases (which can happen in SLEAP), as this would notify the user of the specific issue.

Suggestions:

  1. For the aggregate statistics, it would be useful to add an option for ‘interaction time per bout’, this would then produce a case for each recorded interaction, and its corresponding time in seconds. Such a feature would effectively allow an output of the raw data at the resolution of individual interactions, and users could then aggregate or analyse it as needed.
  2. When only the ROIs overlap, it would be more informative to label the ‘animal 2 key point’ category as ‘ROI only’, or something similar, rather than ‘none’.
  3. I think you may have already touched upon this, but it would be nice to have the range of features available for the ‘ROI’ and ‘Visualizations’ tabs integrated into the anchored ROI tab. Options such as changing the ROI colours and appearance, and producing basic summary visualisations would be useful in the longer-term.

I will go through the tutorial document and make any suggestions or edits that I think will help to clarify things for users. Other than that, I really appreciate you putting this together, as it is already a tractable and robust system for generating interaction data.

sronilsson commented 1 year ago

Fantastic thank you!!!

All of these are quick fixes, except the sleap import, which is a little more involved. A very brief background:

When I first worked with sleap I only found one inference/output file per video - a .slp extension file (which is a h5 object). I thought it was a little odd, as this wasn't a user-friendly file. It contained multiple dataframes and dictionaries with the track identities and body-part coordinates in different tables, missing tracks where not easily observed, had to jump through hoops to get it into an interpretable format.

I have long suspected that there must be an alternative output. Then you sent me the multi-index CSV, which is more in line what I would expect... Can you send me an example of the main SLEAP export format of .h5 together with an associated video, and I will write a function to import it? This week is packed. But I will get this done asap.

Toshea111 commented 1 year ago

That makes sense, I generated the ‘.csv’ file from an ‘.h5’ output using the code provided in this Google Collab: SLEAP-IO - Convert to CSV.ipynb - Colaboratory (google.com).

For reference, an outline of the SLEAP ‘.h5’ output format can be found here: Export Data For Analysis — SLEAP (v1.2.9).

Attached is an ‘.h5’ output file and the associated video, I have used the same one that I sent previously, for familiarity. Let me know if you experience any issues, or have further questions.

Termite Test.zip

sronilsson commented 1 year ago

@Toshea111 fyi if you upgrade through pip, I inserted an option to get detailed data in each interaction bout DETAILED INTERACTION TABLE, example here, fixed the int/str mix up, replaced the None with ROI_ONLY, and you can specify the color/size of each animals ROI in visualization. The CSV/H5 import will come!

Toshea111 commented 1 year ago

Excellent, I'll have a go with the new features later this week, and let you know if any issues arise.

Toshea111 commented 1 year ago

Another potential issue that I have encountered is when using tracking data in which the number of tracks varies over time, due to individuals leaving and re-entering the frame. As the config file requires the total number of individuals to be defined, it then returns an error when the number of tracks in frame deviates from this value.

It would be useful to have an option that allowed for such variation in visible tracks, as there are applications where knowing the frequency of interactions or behaviour is useful, even when individual identity cannot be assigned. There may already be a way to accommodate this, but I have not yet found a solution.

sronilsson commented 1 year ago

Thanks @Toshea111 is it throwing the errorsat import, or during anchored ROI methods?

One question, I tried to use the colab nb and your h5, but was hitting a lot of errors and eventually had to put it aside. Could you help me by sending the entire CSV for the video you sent, and I will work with that CSV to write the SLEAP CSV import methods?

Toshea111 commented 1 year ago

I tried using another video with variable track numbers and it worked, thus I think the previous issue was my own error rather than anything else. The only suggestion I would have is to remove the tracks for any individuals that are not in frame, because currently they appear as '0,0' coordinates.

Looking back through the Colab notebook, I can see the problem. I have now made an updated branch with edits that should make it straightforward to use:

https://colab.research.google.com/github/Toshea111/simba/blob/master/Converting_SLEAP_Analysis_HDF5_to_CSV_Updated.ipynb

In case you continue to experience issues, I have also attached a folder with the original ‘.h5’ output file, converted '.csv' file, and the associated video. Let me know if you need any additional information during the process.

Termite Test.zip

sronilsson commented 1 year ago

@Toshea111 there is an option to import csv files from sleap now in the dropdowns. The caveat is that I haven't had time to challenge it much, it worked on the single file you sent but that's all I know for now lol, got to test it a bit more maybe next week.

Untitled 4
Toshea111 commented 1 year ago

Much appreciated, I'll try inputting some new data to see if I can break it.

Toshea111 commented 1 year ago

I have had a go with the SLEAP '.csv' format option, and I am running into the same error each time, specifically when I try to import the .csv file.

The error message returned is: 'ValueError: Length mismatch: Expected axis has 12 elements, new values have 24 elements'.

A variation of this occurs for different files with different numbers of tracks, although the example above is for a single track. I assume it means that the .csv file I am uploading contains more information or tracks than expected?

One thing to note is that the track is not continuously in frame for the whole video, perhaps that is an issue in itself?

sronilsson commented 1 year ago

Thanks for testing @Toshea111! I tried to replicate with the file I have (deleting all or some tracks for subsets of frames) but couldn't, would you mind sharing a CSV like the one with a single track that is causing issues?

Also I noticed in the notebook you shared a while back that the output was transposed relative to the CSV I have been troubleshooting with (attached below). Is the SimBA input data still expected to look like the attached file, or are each individual animal body-part represented as three columns without a track index?

termites_1.csv

Toshea111 commented 1 year ago

No problem, attached is a .csv file with a single track like the one that was causing issues, I can also provide a version with all the tracks, if needed. Note that the track is not present in all frames, as the hornet disappears from view several times.

The format I have been using is the same as that of 'termites_1.csv', do you have a link to the version of the notebook that you mentioned? The one that I am currently using does not appear to transpose the data as you describe, and the attached output is directly from this notebook.

Hornet Test Track.csv

sronilsson commented 1 year ago

Thanks @Toshea111, i'll test with this file and let you know - THIS is the notebook I saw.

In this screengrab it looks like there is a single row index, but multiindex headers, while termites_1.csv has the inverse with multiindex rows and a single header.

image

Toshea111 commented 1 year ago

That's strange, as you say it looks to be transposed, which is not the output type that I have been working with.

To stay on the safe side, here is a link to the updated version of the notebook that I am currently using:

https://github.com/Toshea111/sleap/blob/develop/docs/notebooks/Convert_HDF5_to_CSV_updated.ipynb

Let me know if any further issues arise.

sronilsson commented 1 year ago

Hi @Toshea111 - sorry for super late reply, I fixed a method that works with single animal - it doesn't have to be transposed in those cases and didn't think of that. When you've got a chance could you try it on your end again? If it doesn't would, could you please send me the file it doesn't work on?

sronilsson commented 1 year ago

Hi again @Toshea111, I have a question. My sleap->df function is slow, so I was looking to take some pointers from your code instead. However your h5 files have have different keys then mine, this is what I see in my sleap test files:

frames
instances
metadata
points
pred_points
suggestions_json
tracks_json
videos_json

Do you know if they have changed names, or they have been completely revamped since I created my tracking files?

Toshea111 commented 1 year ago

No problem, I'll have a try and let you know if it works.

Regarding your '.h5' files, could you send me an example?

I want to try having a detailed look in HDFView, to compare the two.

sronilsson commented 1 year ago

Thanks @Toshea111! These are the files I have, I can't see any node_names etc. But I created these soon after sleap was released, chances are they changed the structure since Testing_Video_2.slp.zip

Toshea111 commented 1 year ago

It looks like those are in the old '.slp' format rather than the new '.h5' export format, which as you say is likely a result of using an early version.

To answer your question, I think the system has been revised rather than just renaming, as the data structures are not exactly the same between the two file types.

The website details the process for exporting '.h5' files here: https://sleap.ai/tutorials/analysis.html.

sronilsson commented 1 year ago

Thanks, super helpful!