Open hbzhang opened 6 years ago
As I understood, your dataset objects differ only in size? Then you should detect all of them as 1 class and differentiate them with simple size threshold.
As I understood, your dataset objects differ only in size? Then you should detect all of them as 1 class and differentiate them with simple size threshold.
No, they don't differ in size, they differ in content/appearance
As I understood, your dataset objects differ only in size? Then you should detect all of them as 1 class and differentiate them with simple size threshold.
No, they don't differ in size, they differ in content/appearance
Content = class (cat/dog/horse etc.) Appearance = variance in class (black/red/brown cat)
Is that how you classify those words? :)
As I understood, your dataset objects differ only in size? Then you should detect all of them as 1 class and differentiate them with simple size threshold.
No, they don't differ in size, they differ in content/appearance
Content = class (cat/dog/horse etc.) Appearance = variance in class (black/red/brown cat)
Is that how you classify those words? :)
We have breast masses, some of the malignant, some of them benign. Our classes then are "malignant" and "benign"
Does it mean you deal with gray-scale picture, with content occupying whole picture area, so that you have to classify structure of the tissue, without detection of some compact objects on it? Can you refer to such pictures?
Does it mean you deal with gray-scale picture, with content occupying whole picture area, so that you have to classify structure of the tissue, without detection of some compact objects on it? Can you refer to such pictures?
yes, they are grayscale images (we have already changes de code for 1 channel). The content usually occupies half image, so we are also trying to crop it in order to reduce the amount of background. The objects to detect are masses, sometimes compact, sometimes more disperse. Then, from a clinical point of view according to some characteristics of the masses (borders, density, shape..) they are classified as malignant or benign. Here you have some sample images (resized to 216*416):
These objects (tumors) can be different size. So you shouldn't restrict with 2 anchor sizes, but use as much as possible, that is 9 in our case. If this is redundant, clustering program would yield 9 closely sized anchors, it is not a problem. What is more important, this channel probably not 8-bit, but deeper, and quantifying from 16 to 8 may lose valuable information. Or may be split 16-bit into two different channels- I don't know, but this is issue to think off...
These objects (tumors) can be different size. So you shouldn't restrict with 2 anchor sizes, but use as much as possible, that is 9 in our case. If this is redundant, clustering program would yield 9 closely sized anchors, it is not a problem. What is more important, this channel probably not 8-bit, but deeper, and quantifying from 16 to 8 may lose valuable information. Or may be split 16-bit into two different channels- I don't know, but this is issue to think off...
Ok, we will try with the 9 anchors. Regarding the 16-bit, we are using tf2 so that's not a problem I think...
Now we are able to detect some masses but when the we lower the score_threshold in the detection.
So far, what we're doing to know the size of the boxes is: 1- We run a clustering method on the normalized ground truth bounding boxes (according to the original size of the image) and get the centroids of the clusters. In our case, we have 2 clusters and the centroids are something about (0.087, 0.052) and (0.178, 0.099). 2- Then we rescale the values according to the rescaling we are going to apply to the images during training. We are working with rectangular images of (256, 416), so we get bounding boxes of (22,22) and (46,42). Note that we have rounded the values as we have read that yoloV3 expects actual pixel values. 3- Since we compute anchors at 3 different scales (3 skip connections), the previous anchor values will correspond to the large scale (52). The anchors for the other two scales (13 and 26) are calculated by dividing the first ancho /2 and /4.
First of all Sorry to join the party late. From what I understand here, you have two classes Malignant and Benign which are merely the output classes but doesn't necessarily have to be of the same size (in dimensions of the bounding boxes) and therefore (as @andyrey suggested) I would suggest to either use the default number and sizes of anchors or run k-means on your dataset to obtain the best sizes for the anchors and best numbers. I am not sure about the sizes but you can increase the number of anchors at least as the images might have different ratios (even if he tumours are of the same size which again might not be the case) and I think would be favourable for your application.
Are all the input images of fixed dimensions ie. (256x416) ? You have also suggested two bounding boxes of (22,22) and (46,42). are the bounding boxes always of these dimensions ? If so there might be something wrong as they may start from those values but should be able to form the box around tumours as tightly as possible. Need more clarification.
Although there is a possibility you might get results but I am not quite sure if YOLO is the perfect algorithm that works on non-rgb. Its quite been some time since I have worked with YOLO and referred the theoretical scripts and papers so I am not quite sure but I would suggest you to first test it by training on your dataset without making a lot of changes and then finetune by making changes to get more accuracy if you receive some promising results in the first case.
@ameeiyn @andyrey Thanks for clarifying on the getting w and h from predictions and anchor values. I think I have got the box w and h successfully using the
box_w = anchor_sets[anchor_index] * exp(offset_w) * 32
box_h = anchor_sets[anchor_index+1] * exp(offset_h) * 32
where offset_whatever is the predicted value of w and h. But I for obtaining the x and y values of the bounding boxes, I am simply multipluing the predicted coordinates (x and y) with image width and height. I am getting poor predictions as well as dislocated boxes:
Can you guys
kindly help ?
I want to learn to please
Your explanations are useless like your existence obviously Only real morons would explain pictures with words instead to write them Is there normal humans that can write few pictures of how anchors look and work?
This may be fundamental: what if I train the network for an object in location (x,y), but detect the same object located in (x+10, y) in a picture ? How can YOLO detect the physical location?
Anchors are initial sizes (width, height) some of which (the closest to the object size) will be resized to the object size - using some outputs from the neural network (final feature map):
x[...]
- outputs of the neural networkbiases[...]
- anchorsb.w
andb.h
result width and height of bounded box that will be showed on the result imageThus, the network should not predict the final size of the object, but should only adjust the size of the nearest anchor to the size of the object.
In Yolo v3 anchors (width, height) - are sizes of objects on the image that resized to the network size (
width=
andheight=
in the cfg-file).In Yolo v2 anchors (width, height) - are sizes of objects relative to the final feature map (32 times smaller than in Yolo v3 for default cfg-files).
Hi @AlexeyAB I understand that yolo v3 anchors are sizes of objects on the image that resized to network size while in yolov2 anchors are sizes of objects relative to the final feature map. but my question is: the way of calculating width and height is same for both yolov3 and yolov2 for example: width =e^(tw)pw and height = e^(th)ph. Then why yolov3 uses anchors of network size and yolov2 uses anchors of final feature map.
Anchors are initial sizes (width, height) some of which (the closest to the object size) will be resized to the object size - using some outputs from the neural network (final feature map):
x[...]
- outputs of the neural networkbiases[...]
- anchorsb.w
andb.h
result width and height of bounded box that will be showed on the result imageThus, the network should not predict the final size of the object, but should only adjust the size of the nearest anchor to the size of the object.
In Yolo v3 anchors (width, height) - are sizes of objects on the image that resized to the network size (
width=
andheight=
in the cfg-file).In Yolo v2 anchors (width, height) - are sizes of objects relative to the final feature map (32 times smaller than in Yolo v3 for default cfg-files).
How many number of anchor boxes needed in yolov4?
So far, reading through this whole three-year long thread, I've concluded that it's probably best just to re-read the papers. There are diagrams in the papers. Both in this and for the most part in the papers it is not made clear whether the anchor boxes are (x, y, w, h) in the input image or in the output feature layers (plural for skip connections).
I'm seeing no connection made between the input and the output of the network at all whatsoever. Literally everything else from batch normalization to internal covariate shift makes sense to me. The anchor boxes don't. It would really help to have a better summary.
I know this might be too simple for many of you. But I can not seem to find a good literature illustrating clearly and definitely for the idea and concept of anchor box in Yolo (V1,V2, andV3). Thanks!