Closed yochiawu closed 1 month ago
Hi,
First, since you already had an annotated dataset, the simplest attempt to increase accuracy is to try these: 1. if you use any preprocessing steps like auto orient or resize in Roboflow, remove them; 2. when perform augmentation, for images like you showed, don't do "vertical flipping" as the mouse upside down would never happen in real scenario, instead, only do "horizontal flipping". You may also do "brightness" as the contrast between mouse and background is low and "brightness" might increase the detection accuracy of the trained Detector. 3. go 8000-10000 iterations when training the Detector, or until the "total loss" in training drop below 0.1.
If this doesn't work, you may need to refine your annotation. First, if possible, increase the contrast of the video, you can use LabGym preprocessing module to achieve this. Resize your videos to 960. 1280 may not be necessary. Keep frame size of all videos 960, and use 960 as the inferencing frame size when training the Detector, go 8000-10000 iterations. Generate images at 960 frame size, and select more colliding/occlusion scenarios. I don't know what are the portions of your current 176-292 images, and don't know whether this number is before or after augmentation. For complex scenarios like low contrast and frequent occlusions, we generally recommend 200-300 annotated images before augmentation. Importantly, make the ratio of colliding/occlusion scenarios 97-99% and the well-separated scenarios 3-1% of the total annotated images.
If there's still ID switching after you do all the above, you may need to either identify the frames which ID switchings happen to include them in the training image set, or record the videos with higher resolution and contrast, or use overhead cameras so that the scenarios that one mouse is completely occluded by another happen less frequently.
Thanks a lot for your advice!
Recently, I am using video recorded by a overhead camera to train the detector. The frame size is 960, training set is 203 annotated images with 6 well-separated before augmentation, the iteration is 10000 (it seems like I can't set it higher). But the total loss is always above 0.1 (around 0.12). Does it matter?
The detector works well for most of the time but there is still one ID switch which I really don't like. And actually the training set contained some of these frames. The scenario can be seen as below. It is one mouse on top of the other. So the mouse on the bottom is devided into two parts. I am wondering if there is anything wrong with my annotation.
MY ANNOTATION
ID SWITICH SCENARIO (frames from behavior example)
Thanks again!
Hi,
When the mouse on the bottom is divided into two parts, like in the first image, how did you annotate it? Did you annotate the separate parts as one mouse? And how did you perform augmentations?
By looking at the behavior example that has ID switching, the borders to distinguish the overlapping mice are not visible. Even a human may not be able to tell where the borders are by just watching these frames. In scenarios like this (uniform color without detectable edges), no cues can facilitate the Detector to find the edges and boundaries (which actually are not present in the images) that separates the two mice, and you need to force the Detector to 'remember' where the 'artificial' borders are. So you have to include those exact frames in which this ID switching occurs for annotation. And probably annotate just one part (the larger part) of the body of bottom mouse. Annotate two visually-separated parts as one mouse may confuse the Detector. When the Detector performs segmentation, it always tries to find the connected parts as one individual and the visually-separated parts are naturally considered as different individuals. In our practice, if we include the exact frames in which the ID switching occurs in annotation and retrain the Detector, the ID switching always disappears. The annotated image you showed seems not to be the exact frame in which the ID switching occurs.
Hi,
Yes, I annotate the separate parts as one mouse. The augmentations contain: Flip: Horizontal, Vertical; Brightness: Between -15% and +15%; Exposure: Between -10% and +10%.
Thanks for explaining to me that. If I annotate just one part of the bottom mouse, should I always annotate head or rear or always annotate the larger part. Because the top mouse would move on the bottom one, sometimes the head is covered and sometimes the rear is covered. I don't know if it will confuse the Detector if I annotate the head sometimes and the rear sometimes.
Here's a trick to increase the amount and diversity of your training dataset in Roboflow: For Roboflow free account, it expand the number of augmented images to around 2 folds of the original images. Therefore, if you choose multiple augmentation methods (>2) at once, not all images will be applied to a method. To maximize the chance of each augmentation method being applied to each image, you can select one method at a time, and export the dataset, and then upload it to create a new project, then apply another method.. and so on. By this way, you will have a dataset of around at least 8X more images (around 1600 images) than the original dataset with those three methods, which could potentially benefit your training and increase the detection accuracy of your Detectors.
As for the annotation, you can always annotate the larger part. When a mouse is occluded, its body isn't intact anyway, so the goal for the Detector in such scenarios is not to be precise, but to be sensitive so that it still detects something that can let it maintain the correct ID.
Thanks a lot! I tried what you suggested and the accuracy of detector has improved!
But now I am stuck with the accuracy of the categorizer. I am wondering could you help me with that?
I am training an 'interactive advanced' mode. But actually, I am focusing in 2 types of interactive behavior and 2 types of non-interactive behavior. So the first question is, should I train 2 different modes (one interactive advanced and one non-interactive) to categorize them?
When I trained the interactive advanced mode, the behavior example contains 15 frames. Should I reduce this to something like 10 frames to improve accuracy, or does it not matter.
Just focusing on the interactive behavior, sniffing can't be categorized well. Sniffing is one mouse sniff to another and it contains physical contact. There is 2 major mistakes. The first one is that even if the animal is just walking around without any contact, it would be seen as sniffing. I thought it might relate with the social distance, but If I try to set it smaller as 1 when I generate the behavior example, in many sniffing scenarios there is just one animal which I don't want it to be. So I can only set it as 2 which contains some scenarios that the 2 animals are totally separated. The second one is sometimes the animal be sniffed would be labled as sniffing, but I only sort the animal performing sniffing.
And I am wondering should I sort the behavior into many different sub-groups? For example, now I just sort it as sniffing. But actually there are different types of sniffing. Would that help if I sort it as staying&sniffing, moving&sniffing, staying&besniffed, moving&besniffed?
I know this is a lot of questions. Thanks for your time and any advice would be appreciated!
Hi,
First of all, you only need to train one Categorizer with "interactive advanced" mode, because it can categorize both interactive behaviors and non-interactive behaviors. Social distance=2 would be appropriate. Setting the duration of one behavior example as 15 frames with fps of 30 would also be good for most mouse behaviors.
Generally speaking, to categorize a behavior well, the behavior examples of that category should be very distinct from other categories. That is, when you watch the behavior examples, you can easily see the differences between that category and others. Besides, the behavior examples in the same category should share some similar features. If the behavior examples in the same category look very different, the Categorizer would not learn well. So if you think the sniffing has multiple types that are very different, it'd be better to divide it into multiple categories.
Specifically, I would imagine that staying&sniffing would look very different from moving&sniffing so I suggest you to have those two distinct categories. By the way, how many pairs of behavior examples for each category did you selected? Are they diverse enough to cover most of the behavior variant, or many of them are just redundant/repeats that look identical? If sniffing sometimes was misclassified with being sniffed, you probably need to increase the amount and diversity of the behavior examples in both categories. And also, how many categories in total did you sort? Categorizing the background behaviors well would also be helpful to increase the accuracy of categorizing the behaviors of your interest.
Hi,
Thanks for your advice! I sorted 7 behavior, and the most important behavior contains 600 examples. But I think they look similar. I will try to enrich the dataset. Thanks a lot!
To train a Categorizer that can generalize well, you can sparsely select behavior examples from different animals across different videos, rather than selecting all examples from just one video. For interactive advanced mode, we recommend 300-500 pairs of behavior examples per category to train a good Categorizer. You may take a look at the sorted behavior examples in LabGym Zoo and get an idea on how they should look like in each category.
Thanks for the examples!
But I still have a concern. Using walking as an example, if I set the social distance to 2, there are two scenarios. One is when there is only one animal walking, and the other is when there is one animal walking, but another animal appears in the graph. This also happens to other non-interactive behaviors. Should I sort them separately? Like walking alone and walking near others?
Hi,
Generally speaking, if you include enough examples of both scenarios, the trained Categorizer can learn to recognize the key pattern of moving (colored contours) and consider the other mouse (the gray contours) as background noise.
If you take a look at our sorted behavior examples of the two-vole interaction in LabGym Zoo, you would notice we also included both scenarios in 'locomotion' and the trained Categorizer is pretty accurate in detection this type of behavior. By the way, the 'investigating' and 'allowing' in that behavior example dataset is pretty similar to the 'sniffing' and 'being sniffed' in your experiment and the trained Categorizer (also top view, 2 animals, 30 fps, 9 behaviors) may generalize on your data. You probably can take a look at the examples and try that Categorizer and get some idea about how to sort the behavior examples and how to train a Categorizer.
Hi,
I would like to ask how to improve the accuracy of the detector. I am using LabGym2.4.5 to detect the behavior of two mice. The video fps is 30 and the framesize is 720*1280. However, there are many circling behavior and overlap in the video so it is difficult to detect(Like the figure below). The data of the best detector I have trained is: 176 training set, framesize:960, iteration: 1000. But it still has two switches in 30s. I tried to change the framesize(from 640 to 1280), iteration(from 1000 to 6000), training set(from 176 to 292), but none of them works.
Could you please give me some advice? Thanks a lot!