Open mnslarcher opened 4 years ago
Hi,
If someone want to give it a try, like https://github.com/Cli98/anchor_computation_tool, I also adapted the code of https://github.com/zhouyuangan/K-Means-Anchors for obtaining anchor ratios using K-Means.
The main difference is that my implementation is faster (less for loops), it takes as input a .json file (not a xml) and has some additional settings (but the Changlin Li repo has more visualization).
https://github.com/mnslarcher/kmeans-anchor-ratios
If you try my code on COCO you will obtain anchor ratios in line with this repo (but for your data set they can be different and this is why it's useful).
Any feedback is appreciated.
UPDATE: I added the commands to get the optimal anchor ratios for COCO, the returned ratios are
[(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
, very near to[(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
(default values of this repo).
@mnslarcher Good to see this.
Thanks @Cli98 !
Hi,
If someone want to give it a try, like https://github.com/Cli98/anchor_computation_tool, I also adapted the code of https://github.com/zhouyuangan/K-Means-Anchors for obtaining anchors ratios using K-Means.
The main difference is that my implementation is faster (less for loops), it takes as input a .json file (not a xml) and has some additional settings (but the Changlin Li repo has more visualization).
https://github.com/mnslarcher/kmeans-anchors-ratios
If you try my code on COCO you will obtain anchors ratios in line with this repo (but for your data set they can be different and this is why it's useful).
Any feedback is appreciated.
UPDATE: I added the commands to get the optimal anchors ratios for COCO, the returned ratios are
[(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
, very near to[(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
(default values of this repo).
@mnslarcher lol you claimed that your repo is faster than mine. I don't agree on that since I see no reasons to justify this.
Hi, If someone want to give it a try, like https://github.com/Cli98/anchor_computation_tool, I also adapted the code of https://github.com/zhouyuangan/K-Means-Anchors for obtaining anchors ratios using K-Means. The main difference is that my implementation is faster (less for loops), it takes as input a .json file (not a xml) and has some additional settings (but the Changlin Li repo has more visualization). https://github.com/mnslarcher/kmeans-anchors-ratios If you try my code on COCO you will obtain anchors ratios in line with this repo (but for your data set they can be different and this is why it's useful). Any feedback is appreciated. UPDATE: I added the commands to get the optimal anchors ratios for COCO, the returned ratios are
[(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
, very near to[(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
(default values of this repo).@mnslarcher lol you claimed that your repo is faster than mine. I don't agree on that since I see no reasons to justify this.
Hi @Cli98 :)! I'm saying this because you use the implementation of https://github.com/zhouyuangan/K-Means-Anchors. From that implementation I removed this
for row in range(rows):
distances[row] = 1 - iou(boxes[row], clusters)
by vectoring the iou calculation.
From you notebook:
rows = boxes.shape[0]
distances = np.empty((rows, k))
last_clusters = np.zeros((rows,))
np.random.seed()
# the Forgy method will fail if the whole array contains the same rows
clusters = boxes[np.random.choice(rows, k, replace=False)]
while True:
for row in range(rows):
distances[row] = 1 - iou(boxes[row], clusters)
nearest_clusters = np.argmin(distances, axis=1)
if (last_clusters == nearest_clusters).all():
break
for cluster in range(k):
clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
last_clusters = nearest_clusters
return clusters
This is mine:
np.random.seed(seed)
last_nearest_centroids = np.ones(boxes.shape[0]) * -1
# the Forgy method will fail if the whole array contains the same rows
centroids = boxes[np.random.choice(boxes.shape[0], num_clusters, replace=False)]
i = 0
while True:
if i >= max_iter:
logger.warning(
f"[{datetime.now().strftime('%m/%d %H:%M:%S')}] "
"Maximum number of iterations reached, increase max_inter to do more "
f"iterations (max_inter = {max_iter})"
)
break
nearest_centroids = np.argmax(iou(boxes, centroids), axis=1)
for centroid in range(num_clusters):
centroids[centroid] = centroid_calc_fn(
boxes[nearest_centroids == centroid], axis=0
)
if (nearest_centroids == last_nearest_centroids).all():
break
last_nearest_centroids = nearest_centroids
i += 1
return centroids
This made the implementation faster as one would expect. I have an option to run K-Means more than one time but if you set --num-runs=1 you should see some benefit, obviously feel free to correct me but this is the reason
Hi, If someone want to give it a try, like https://github.com/Cli98/anchor_computation_tool, I also adapted the code of https://github.com/zhouyuangan/K-Means-Anchors for obtaining anchors ratios using K-Means. The main difference is that my implementation is faster (less for loops), it takes as input a .json file (not a xml) and has some additional settings (but the Changlin Li repo has more visualization). https://github.com/mnslarcher/kmeans-anchors-ratios If you try my code on COCO you will obtain anchors ratios in line with this repo (but for your data set they can be different and this is why it's useful). Any feedback is appreciated. UPDATE: I added the commands to get the optimal anchors ratios for COCO, the returned ratios are
[(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
, very near to[(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
(default values of this repo).@mnslarcher lol you claimed that your repo is faster than mine. I don't agree on that since I see no reasons to justify this.
Hi @Cli98 :)! I'm saying this because you use the implementation of https://github.com/zhouyuangan/K-Means-Anchors. From that implementation I removed this
for row in range(rows): distances[row] = 1 - iou(boxes[row], clusters)
by vectoring the iou calculation.
From you notebook:
rows = boxes.shape[0] distances = np.empty((rows, k)) last_clusters = np.zeros((rows,)) np.random.seed() # the Forgy method will fail if the whole array contains the same rows clusters = boxes[np.random.choice(rows, k, replace=False)] while True: for row in range(rows): distances[row] = 1 - iou(boxes[row], clusters) nearest_clusters = np.argmin(distances, axis=1) if (last_clusters == nearest_clusters).all(): break for cluster in range(k): clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0) last_clusters = nearest_clusters return clusters
This is mine:
np.random.seed(seed) last_nearest_centroids = np.ones(boxes.shape[0]) * -1 # the Forgy method will fail if the whole array contains the same rows centroids = boxes[np.random.choice(boxes.shape[0], num_clusters, replace=False)] i = 0 while True: if i >= max_iter: logger.warning( f"[{datetime.now().strftime('%m/%d %H:%M:%S')}] " "Maximum number of iterations reached, increase max_inter to do more " f"iterations (max_inter = {max_iter})" ) break nearest_centroids = np.argmax(iou(boxes, centroids), axis=1) for centroid in range(num_clusters): centroids[centroid] = centroid_calc_fn( boxes[nearest_centroids == centroid], axis=0 ) if (nearest_centroids == last_nearest_centroids).all(): break last_nearest_centroids = nearest_centroids i += 1 return centroids
This made the implementation faster as one would expect. I have an option to run K-Means more than one time but if you set --num-runs=1 you should see some benefit, obviously feel free to correct me but this is the reason
@mnslarcher
Where is your vectorized iou code? Are you referring to this?
nearest_centroids = np.argmax(iou(boxes, centroids), axis=1)
Where is your vectorized iou code? Are you referring to this?
nearest_centroids = np.argmax(iou(boxes, centroids), axis=1)
In that line I'm using the vectorized iou, this is my implementation:
def iou(boxes, anchors):
"""Calculates the Intersection over Union (IoU) between a numpy array of
n boxes and an array of k anchors.
Arguments:
boxes (array_like): array of shape (n, 2) of boxes' widths and heights.
anchors (array_like): array of shape (k, 2) of anchors' widths and heights.
Returns:
A numpy array of shape (n, k) containing the IoU values for each combination
of boxes and anchors.
"""
intersection_width = np.minimum(anchors[:, [0]], boxes[:, 0]).T
intersection_height = np.minimum(anchors[:, [1]], boxes[:, 1]).T
if np.any(intersection_width == 0) or np.any(intersection_height == 0):
raise ValueError("Some boxes have zero size.")
intersection_area = intersection_width * intersection_height
boxes_area = np.prod(boxes, axis=1, keepdims=True)
anchors_area = np.prod(anchors, axis=1, keepdims=True).T
union_area = boxes_area + anchors_area - intersection_area
return intersection_area / union_area
This is the original one:
def iou(box, clusters):
"""
Calculates the Intersection over Union (IoU) between a box and k clusters.
:param box: tuple or array, shifted to the origin (i. e. width and height)
:param clusters: numpy array of shape (k, 2) where k is the number of clusters
:return: numpy array of shape (k, 0) where k is the number of clusters
"""
x = np.minimum(clusters[:, 0], box[0])
y = np.minimum(clusters[:, 1], box[1])
if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
raise ValueError("Box has no area")
intersection = x * y
box_area = box[0] * box[1]
cluster_area = clusters[:, 0] * clusters[:, 1]
iou_ = intersection / (box_area + cluster_area - intersection)
return iou_
If you see, one return a (n, k) array and the other a (k, 0)
Where is your vectorized iou code? Are you referring to this? nearest_centroids = np.argmax(iou(boxes, centroids), axis=1)
In that line I'm using the vectorized iou, this is my implementation:
def iou(boxes, anchors): """Calculates the Intersection over Union (IoU) between a numpy array of n boxes and an array of k anchors. Arguments: boxes (array_like): array of shape (n, 2) of boxes' widths and heights. anchors (array_like): array of shape (k, 2) of anchors' widths and heights. Returns: A numpy array of shape (n, k) containing the IoU values for each combination of boxes and anchors. """ intersection_width = np.minimum(anchors[:, [0]], boxes[:, 0]).T intersection_height = np.minimum(anchors[:, [1]], boxes[:, 1]).T if np.any(intersection_width == 0) or np.any(intersection_height == 0): raise ValueError("Some boxes have zero size.") intersection_area = intersection_width * intersection_height boxes_area = np.prod(boxes, axis=1, keepdims=True) anchors_area = np.prod(anchors, axis=1, keepdims=True).T union_area = boxes_area + anchors_area - intersection_area return intersection_area / union_area
This is the original one:
def iou(box, clusters): """ Calculates the Intersection over Union (IoU) between a box and k clusters. :param box: tuple or array, shifted to the origin (i. e. width and height) :param clusters: numpy array of shape (k, 2) where k is the number of clusters :return: numpy array of shape (k, 0) where k is the number of clusters """ x = np.minimum(clusters[:, 0], box[0]) y = np.minimum(clusters[:, 1], box[1]) if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0: raise ValueError("Box has no area") intersection = x * y box_area = box[0] * box[1] cluster_area = clusters[:, 0] * clusters[:, 1] iou_ = intersection / (box_area + cluster_area - intersection) return iou_
@mnslarcher I see. But that will not change too much if you only vectorize iou function. Normally we have ~50 bboxs. That's not a large number. Did you time you kmean function? Any stats?
@Cli98 , I did it but honestly I don't remember the exact result, maybe I will re-do it this we and post it here :)
@mnslarcher I see.
But that will not change too much if you only vectorize iou function. Normally we have ~50 bboxs. That's not a large number. Did you time you kmean function? Any stats?
Why do you say that we have ~50 bboxs? in my dataset I have 100k images with more ore less 10 boxes each so 1mln bboxes, far more than 50 if I get your point (probably I misunderstood you). Clearly if your dataset is small the benefit is basically zero.
In general the increase of speed is not a crucial feature of my repo, is only a plus, other than that I added a couple of functionalities that were useful for me and I will continue to add more, I have not done anything extraordinary, but if anyone wants to try it, I will be happy to receive any feedback
Hi @Cli98 , I did the speed test, the result, if I haven't made any mistake, is that my implementation is much faster (70x faster on COCO train 2017).
I saw in a comment that you then edited that you didn't find any evidence of this so I'll post here the results and also how you can replicate them, I did a very fast experiment so feel free to tell me if you spot any mistake, I didn't use timeit just because I didn't want to wait but feel free to replace %%time with %%timeit.
The results (mine vs your): with 100 bboxes: 3.47 ms vs 10.7 ms with 1000 bboxes: 4.16 ms vs 104 ms with 10000 bboxes: 20.8 ms vs 3.65 s with 100000 bboxes: 531 ms vs 42.5 s with 859999 bboxes (coco train 2017): 6.5 s vs 7min 24s
So if you see, for COCO mine take just 6.5s, your 7min 24s, both return the same result (mine is 70x faster in this experiment).
Get the bboxes:
import json
import numpy as np
import os
INSTANCES_PATH = "./annotations/instances_train2017.json"
if not os.path.exists(INSTANCES_PATH):
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!unzip annotations_trainval2017.zip
with open(INSTANCES_PATH) as f:
instances = json.load(f)
bboxes = np.array([i["bbox"][2:] for i in instances["annotations"] if np.prod(i["bbox"][2:]) > 0])
del instances
Define the functions (not_vectorized are the original fun that you are also using in your notebook, the others are the ones that I defined for my repo):
def not_vectorized_iou(box, clusters):
"""
Calculates the Intersection over Union (IoU) between a box and k clusters.
:param box: tuple or array, shifted to the origin (i. e. width and height)
:param clusters: numpy array of shape (k, 2) where k is the number of clusters
:return: numpy array of shape (k, 0) where k is the number of clusters
"""
x = np.minimum(clusters[:, 0], box[0])
y = np.minimum(clusters[:, 1], box[1])
if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
raise ValueError("Box has no area")
intersection = x * y
box_area = box[0] * box[1]
cluster_area = clusters[:, 0] * clusters[:, 1]
iou_ = intersection / (box_area + cluster_area - intersection)
return iou_
def not_vectorized_kmeans(boxes, k, dist=np.median):
"""
Calculates k-means clustering with the Intersection over Union (IoU) metric.
:param boxes: numpy array of shape (r, 2), where r is the number of rows
:param k: number of clusters
:param dist: distance function
:return: numpy array of shape (k, 2)
"""
rows = boxes.shape[0]
distances = np.empty((rows, k))
last_clusters = np.zeros((rows,))
np.random.seed()
# the Forgy method will fail if the whole array contains the same rows
clusters = boxes[np.random.choice(rows, k, replace=False)]
while True:
for row in range(rows):
distances[row] = 1 - not_vectorized_iou(boxes[row], clusters)
nearest_clusters = np.argmin(distances, axis=1)
if (last_clusters == nearest_clusters).all():
break
for cluster in range(k):
clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
last_clusters = nearest_clusters
return clusters
def iou(boxes, anchors):
"""Calculates the Intersection over Union (IoU) between a numpy array of
n boxes and an array of k anchors.
Arguments:
boxes (array_like): array of shape (n, 2) of boxes' widths and heights.
anchors (array_like): array of shape (k, 2) of anchors' widths and heights.
Returns:
A numpy array of shape (n, k) containing the IoU values for each combination
of boxes and anchors.
"""
intersection_width = np.minimum(anchors[:, [0]], boxes[:, 0]).T
intersection_height = np.minimum(anchors[:, [1]], boxes[:, 1]).T
if np.any(intersection_width == 0) or np.any(intersection_height == 0):
raise ValueError("Some boxes have zero size.")
intersection_area = intersection_width * intersection_height
boxes_area = np.prod(boxes, axis=1, keepdims=True)
anchors_area = np.prod(anchors, axis=1, keepdims=True).T
union_area = boxes_area + anchors_area - intersection_area
return intersection_area / union_area
def kmeans(boxes, num_clusters=3, max_iter=300, seed=None, centroid_calc_fn=np.median):
"""Calculates K-Means clustering using the Intersection over Union (IoU) metric.
Arguments:
boxes (array_like): array of the bounding boxes' heights and widths, with
shape (n, 2).
num_clusters (int): the number of clusters to form as well as the number of
centroids to generate.
max_iter (int, optional): maximum number of iterations of the K-Means algorithm
for a single run. Default: 300.
seed: (int, optional): random seed. Default: None.
centroid_calc_fn (function, optional): function used for calculating centroids
Default: numpy.median.
Returns:
A numpy array of shape (num_clusters, 2).
"""
np.random.seed(seed)
last_nearest_centroids = np.ones(boxes.shape[0]) * -1
# the Forgy method will fail if the whole array contains the same rows
centroids = boxes[np.random.choice(boxes.shape[0], num_clusters, replace=False)]
i = 0
while True:
if i >= max_iter:
logger.warning(
f"[{datetime.now().strftime('%m/%d %H:%M:%S')}] "
"Maximum number of iterations reached, increase max_inter to do more "
f"iterations (max_inter = {max_iter})"
)
break
nearest_centroids = np.argmax(iou(boxes, centroids), axis=1)
for centroid in range(num_clusters):
centroids[centroid] = centroid_calc_fn(
boxes[nearest_centroids == centroid], axis=0
)
if (nearest_centroids == last_nearest_centroids).all():
break
last_nearest_centroids = nearest_centroids
i += 1
return centroids
Run the experiment that you wish, es:
%%time
kmeans(bboxes, 3)
Result:
CPU times: user 6.49 s, sys: 174 µs, total: 6.49 s
Wall time: 6.5 s
array([[201.49, 218.97],
[ 55.75, 65.25],
[ 15.97, 19.85]])
%%time
not_vectorized_kmeans(bboxes, 3)
Result:
CPU times: user 7min 28s, sys: 8.91 s, total: 7min 36s
Wall time: 7min 24s
array([[ 15.97, 19.85],
[ 55.75, 65.25],
[201.49, 218.97]])
@Cli98 , I did it but honestly I don't remember the exact result, maybe I will re-do it this we and post it here :)
@mnslarcher I see. But that will not change too much if you only vectorize iou function. Normally we have ~50 bboxs. That's not a large number. Did you time you kmean function? Any stats?
Why do you say that we have ~50 bboxs? in my dataset I have 100k images with more ore less 10 boxes each so 1mln bboxes, far more than 50 if I get your point (probably I misunderstood you). Clearly if your dataset is small the benefit is basically zero.
In general the increase of speed is not a crucial feature of my repo, is only a plus, other than that I added a couple of functionalities that were useful for me and I will continue to add more, I have not done anything extraordinary, but if anyone wants to try it, I will be happy to receive any feedback
Apologize for late reply. I'm referring to single image instead, not for all of them. Normally we do not conduct kmean experiments on all bounding boxes given their large variation, not to say class imbalance issue.
Hi @Cli98 , I did the speed test, the result, if I haven't made any mistake, is that my implementation is much faster (70x faster on COCO train 2017).
I saw in a comment that you then edited that you didn't find any evidence of this so I'll post here the results and also how you can replicate them, I did a very fast experiment so feel free to tell me if you spot any mistake, I didn't use timeit just because I didn't want to wait but feel free to replace %%time with %%timeit.
The results (mine vs your): with 100 bboxes: 3.47 ms vs 10.7 ms with 1000 bboxes: 4.16 ms vs 104 ms with 10000 bboxes: 20.8 ms vs 3.65 s with 100000 bboxes: 531 ms vs 42.5 s with 859999 bboxes (coco train 2017): 6.5 s vs 7min 24s
So if you see, for COCO mine take just 6.5s, your 7min 24s, both return the same result (mine is 70x faster in this experiment).
Get the bboxes:
import json import numpy as np import os INSTANCES_PATH = "./annotations/instances_train2017.json" if not os.path.exists(INSTANCES_PATH): !wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip !unzip annotations_trainval2017.zip with open(INSTANCES_PATH) as f: instances = json.load(f) bboxes = np.array([i["bbox"][2:] for i in instances["annotations"] if np.prod(i["bbox"][2:]) > 0]) del instances
Define the functions (not_vectorized are the original fun that you are also using in your notebook, the others are the ones that I defined for my repo):
def not_vectorized_iou(box, clusters): """ Calculates the Intersection over Union (IoU) between a box and k clusters. :param box: tuple or array, shifted to the origin (i. e. width and height) :param clusters: numpy array of shape (k, 2) where k is the number of clusters :return: numpy array of shape (k, 0) where k is the number of clusters """ x = np.minimum(clusters[:, 0], box[0]) y = np.minimum(clusters[:, 1], box[1]) if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0: raise ValueError("Box has no area") intersection = x * y box_area = box[0] * box[1] cluster_area = clusters[:, 0] * clusters[:, 1] iou_ = intersection / (box_area + cluster_area - intersection) return iou_ def not_vectorized_kmeans(boxes, k, dist=np.median): """ Calculates k-means clustering with the Intersection over Union (IoU) metric. :param boxes: numpy array of shape (r, 2), where r is the number of rows :param k: number of clusters :param dist: distance function :return: numpy array of shape (k, 2) """ rows = boxes.shape[0] distances = np.empty((rows, k)) last_clusters = np.zeros((rows,)) np.random.seed() # the Forgy method will fail if the whole array contains the same rows clusters = boxes[np.random.choice(rows, k, replace=False)] while True: for row in range(rows): distances[row] = 1 - not_vectorized_iou(boxes[row], clusters) nearest_clusters = np.argmin(distances, axis=1) if (last_clusters == nearest_clusters).all(): break for cluster in range(k): clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0) last_clusters = nearest_clusters return clusters def iou(boxes, anchors): """Calculates the Intersection over Union (IoU) between a numpy array of n boxes and an array of k anchors. Arguments: boxes (array_like): array of shape (n, 2) of boxes' widths and heights. anchors (array_like): array of shape (k, 2) of anchors' widths and heights. Returns: A numpy array of shape (n, k) containing the IoU values for each combination of boxes and anchors. """ intersection_width = np.minimum(anchors[:, [0]], boxes[:, 0]).T intersection_height = np.minimum(anchors[:, [1]], boxes[:, 1]).T if np.any(intersection_width == 0) or np.any(intersection_height == 0): raise ValueError("Some boxes have zero size.") intersection_area = intersection_width * intersection_height boxes_area = np.prod(boxes, axis=1, keepdims=True) anchors_area = np.prod(anchors, axis=1, keepdims=True).T union_area = boxes_area + anchors_area - intersection_area return intersection_area / union_area def kmeans(boxes, num_clusters=3, max_iter=300, seed=None, centroid_calc_fn=np.median): """Calculates K-Means clustering using the Intersection over Union (IoU) metric. Arguments: boxes (array_like): array of the bounding boxes' heights and widths, with shape (n, 2). num_clusters (int): the number of clusters to form as well as the number of centroids to generate. max_iter (int, optional): maximum number of iterations of the K-Means algorithm for a single run. Default: 300. seed: (int, optional): random seed. Default: None. centroid_calc_fn (function, optional): function used for calculating centroids Default: numpy.median. Returns: A numpy array of shape (num_clusters, 2). """ np.random.seed(seed) last_nearest_centroids = np.ones(boxes.shape[0]) * -1 # the Forgy method will fail if the whole array contains the same rows centroids = boxes[np.random.choice(boxes.shape[0], num_clusters, replace=False)] i = 0 while True: if i >= max_iter: logger.warning( f"[{datetime.now().strftime('%m/%d %H:%M:%S')}] " "Maximum number of iterations reached, increase max_inter to do more " f"iterations (max_inter = {max_iter})" ) break nearest_centroids = np.argmax(iou(boxes, centroids), axis=1) for centroid in range(num_clusters): centroids[centroid] = centroid_calc_fn( boxes[nearest_centroids == centroid], axis=0 ) if (nearest_centroids == last_nearest_centroids).all(): break last_nearest_centroids = nearest_centroids i += 1 return centroids
Run the experiment that you wish, es:
%%time kmeans(bboxes, 3)
Result:
CPU times: user 6.49 s, sys: 174 µs, total: 6.49 s Wall time: 6.5 s array([[201.49, 218.97], [ 55.75, 65.25], [ 15.97, 19.85]])
%%time not_vectorized_kmeans(bboxes, 3)
Result:
CPU times: user 7min 28s, sys: 8.91 s, total: 7min 36s Wall time: 7min 24s array([[ 15.97, 19.85], [ 55.75, 65.25], [201.49, 218.97]])
@mnslarcher I actually modify my comment and delete the second sentence. But I have no idea why you see can see my previous statement.
I actually conduct my experiment on all code snipes and find avg 1000 trials per 100 bboxs gives similar performance 38/57 ms. But you're get a different way to time it. I think it also makes sense. So I think that's the end of this discussion and I will try to do some optimization for my code XD
@Cli98 , I did it but honestly I don't remember the exact result, maybe I will re-do it this we and post it here :)
@mnslarcher I see. But that will not change too much if you only vectorize iou function. Normally we have ~50 bboxs. That's not a large number. Did you time you kmean function? Any stats?
Why do you say that we have ~50 bboxs? in my dataset I have 100k images with more ore less 10 boxes each so 1mln bboxes, far more than 50 if I get your point (probably I misunderstood you). Clearly if your dataset is small the benefit is basically zero. In general the increase of speed is not a crucial feature of my repo, is only a plus, other than that I added a couple of functionalities that were useful for me and I will continue to add more, I have not done anything extraordinary, but if anyone wants to try it, I will be happy to receive any feedback
Apologize for late reply. I'm referring to single image instead, not for all of them. Normally we do not conduct kmean experiments on all bounding boxes given their large variation, not to say class imbalance issue.
Mmm I'm not sure I get your point, I understand that maybe you don't want to run K-Means on all you bounding boxes if your dateset is not balanced (mine is balanced) but I don't understand what is the utility of running K-Means on just one image to decide anchors ratios, but is late where I'm and I'm tired so I'll go to sleep now :)...
@Cli98 , I did it but honestly I don't remember the exact result, maybe I will re-do it this we and post it here :)
@mnslarcher I see. But that will not change too much if you only vectorize iou function. Normally we have ~50 bboxs. That's not a large number. Did you time you kmean function? Any stats?
Why do you say that we have ~50 bboxs? in my dataset I have 100k images with more ore less 10 boxes each so 1mln bboxes, far more than 50 if I get your point (probably I misunderstood you). Clearly if your dataset is small the benefit is basically zero. In general the increase of speed is not a crucial feature of my repo, is only a plus, other than that I added a couple of functionalities that were useful for me and I will continue to add more, I have not done anything extraordinary, but if anyone wants to try it, I will be happy to receive any feedback
Apologize for late reply. I'm referring to single image instead, not for all of them. Normally we do not conduct kmean experiments on all bounding boxes given their large variation, not to say class imbalance issue.
Mmm I'm not sure I get your point, I understand that maybe you don't want to run K-Means on all you bounding boxes if your dateset is not balanced (mine is balanced) but I don't understand what is the utility of running K-Means on just one image to decide anchors ratios, but is late where I'm and I'm tired so I'll go to sleep now :)...
"running K-Means on just one image to..." Nope, that's not what I mean sir... But just let it pass.
actually, I think sklearn.cluster.KMeans is much faster, considering it's a c impl.
actually, I think sklearn.cluster.KMeans is much faster, considering it's a c impl.
Probably, but what we call K-Means here is K-Means with IoU as a distance metric (is the main contribution of this kind of repos) and for my current understanding (in the beginning I tried to see if it was possible to use sklearn) is not straightforward to implement K-Means using as a distance metric IoU using sklearn, if someone know how to do the same using that implementation of K-Means I would be curious to see it and eventually to update my code.
The first problem is that that implementation doesn’t let you choose a distance metric at all. Then there are other stuffs that you need to consider for this implementation, if you look at the k-means implementation of these repos you should be able to see it 😊.
@mnslarcher I did replace it with sklearn's kmeans.
def kmeans(boxes, num_clusters=3, max_iter=300, seed=None, centroid_calc_fn=np.median):
"""Calculates K-Means clustering using the Intersection over Union (IoU) metric.
Arguments:
boxes (array_like): array of the bounding boxes' heights and widths, with
shape (n, 2).
num_clusters (int): the number of clusters to form as well as the number of
centroids to generate.
max_iter (int, optional): maximum number of iterations of the K-Means algorithm
for a single run. Default: 300.
seed: (int, optional): random seed. Default: None.
centroid_calc_fn (function, optional): function used for calculating centroids
Default: numpy.median.
Returns:
A numpy array of shape (num_clusters, 2).
"""
kmeans = KMeans(n_clusters=num_clusters, max_iter=300, tol=1e-3, random_state=seed).fit(boxes)
centroids = kmeans.cluster_centers_
return centroids
But yours is obviously much faster and more efficient because sklearn uses all available cpus while yours only takes one.
And if the statistics are right, I think both of them provide the similar correct result.
I think both using iou and euclidean distance are quite similar for this kind of situation. But yes, iou is more straightforward. Just realize this is why iou loss beats l1 loss in training bbox regression model.
[06/11 15:18:07] Starting the calculation of the optimal anchors ratios
[06/11 15:18:07] Extracting and preprocessing bounding boxes
[06/11 15:18:07] Discarding 0 bounding boxes with size lower or equal to 0
[06/11 15:18:07] K-Means (10 runs): 100%|███████████████| 10/10 [00:04<00:00, 2.07it/s]
[06/11 15:18:12] Runs avg. IoU: 87.90% ± 0.00% (mean ± std. dev. of 10 runs, 0 skipped)
[06/11 15:18:12] Avg. IoU between bboxes and their most similar anchors after normalizing them so that they have the same area (only the anchor ratios matter): 87.90%
[06/11 15:18:12] Avg. IoU between bboxes and their most similar anchors (no normalization, both anchor ratios and sizes matter): 55.65%
[06/11 15:18:13] Num. bboxes with similar anchors (IoU >= 0.5): 79499/124806 (63.70%)
[06/11 15:18:13] Optimal anchors ratios: [(0.78, 1.29), (1.04, 0.96), (1.25, 0.8)]
[06/11 15:18:25] Starting the calculation of the optimal anchors ratios
[06/11 15:18:25] Extracting and preprocessing bounding boxes
[06/11 15:18:25] Discarding 0 bounding boxes with size lower or equal to 0
[06/11 15:18:25] K-Means (10 runs): 100%|███████████████| 10/10 [00:18<00:00, 1.89s/it]
[06/11 15:18:44] Runs avg. IoU: 87.37% ± 0.00% (mean ± std. dev. of 10 runs, 0 skipped)
[06/11 15:18:44] Avg. IoU between bboxes and their most similar anchors after normalizing them so that they have the same area (only the anchor ratios matter): 87.37%
[06/11 15:18:44] Avg. IoU between bboxes and their most similar anchors (no normalization, both anchor ratios and sizes matter): 55.73%
[06/11 15:18:44] Num. bboxes with similar anchors (IoU >= 0.5): 79599/124806 (63.78%)
[06/11 15:18:44] Optimal anchors ratios: [(0.74, 1.36), (1.02, 0.99), (1.28, 0.78)]
Nice! yes I think is true than one can have a decent approximation of K-Means with IoU using regular K-Means with euclidean distance, in both cases you end up with reasonable anchors ratios
@mnslarcher Hi, I tried to compare the optimal anchors with the default anchors. And for most of the time, it helps, but in some datasets, it doesn't. Like this:
[06/11 17:08:58] When you use default anchors...
[06/11 17:08:58] Avg. IoU between bboxes and their most similar anchors (no normalization, both anchor ratios and sizes matter): 69.61%
[06/11 17:08:59] Num. bboxes with similar anchors (IoU >= 0.5): 37659/38510 (97.79%)
[06/11 17:08:59] Default anchors ratios: [(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
[06/11 17:08:59] Runs avg. IoU: 88.01% ± 0.00% (mean ± std. dev. of 10 runs, 0 skipped)
[06/11 17:08:59] Avg. IoU between bboxes and their most similar anchors after normalizing them so that they have the same area (only the anchor ratios matter): 88.01%
[06/11 17:08:59] Avg. IoU between bboxes and their most similar anchors (no normalization, both anchor ratios and sizes matter): 70.88%
[06/11 17:08:59] Num. bboxes with similar anchors (IoU >= 0.5): 37275/38510 (96.79%)
[06/11 17:08:59] Optimal anchors ratios: [(0.8, 1.3), (1.0, 1.0), (1.2, 0.9)]
[06/11 17:08:59] Improvement: -1.00%
The ratio of bboxes with similar anchors reduces by 1% after using the optimal anchors. It'd be nice to have a comparison with the default anchors, so that we can know whether we should apply the new one or not.
Also the reason why it sometimes doesn't help after the fitting should be also intriguing.
It makes sense, in the end K-Means is not trying to maximize the coverage, is doing something correlated but not exactly the same, in particular is possible that the default anchors have a lower avg IoU with the bounding boxes but they have an IoU > 0.5 more frequently, instead the avg IoU of the K-Means anchors is probably higher but they sacrifice some bounding boxes for this. At least this is my current intuition.
I think that default > k-means is almost always when both have a very good coverage after removing the effect of the anchors sizes (like in the case that you reported)
For giving you an example of the limitations of K-Means, I have one detector that needs to detect electric poles. In almost all the images, for every poles, heights >> widths, but I have also some cases where the helicopter took the photo from a strange perspective and some pole height << pole width.
In my script you can increase num_anchors_ratios
but, given that the cases above are rare, this will produce just another anchor with height > width, is a limitation of K-Means.
My solution for these cases is to run 1 time the algorithm with num_anchors_ratios=2 or 3
and then I re-run it with num_anchors_ratios=1 or 2
(depending if I want end up with 3, 4 or 5 anchors ratios) on what in the tutorial I call instances_without_similar_anchors
. This produce good results in my case and I think is a reasonable way to increase coverage.
I'll add the comparison with the default anchors in the output of the script, thanks! :)
Quick update:
I did what @zylo117 suggested, now the output is something like this:
[06/13 12:57:38] Reading ./annotations/instances_train2017.json
[06/13 12:57:54] Starting the calculation of the optimal anchors ratios
[06/13 12:57:54] Extracting and preprocessing bounding boxes
[06/13 12:57:57] Discarding 2 bounding boxes with size lower or equal to 0
[06/13 12:57:57] K-Means (3 runs): 100%|██████████████████| 3/3 [00:33<00:00, 11.06s/it]
Runs avg. IoU: 80.48% ± 0.00% (mean ± std. dev. of 3 runs, 0 skipped)
Avg. IoU between bboxes and their most similar anchors after norm. them to make their area equal (only ratios matter):
80.48%
[06/13 12:58:33] Default anchors ratios: [(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
Avg. IoU between bboxes and their most similar default anchors, no norm. (both ratios and sizes matter): 55.16%
Num. bboxes without similar default anchors (IoU < 0.5): 253049/860001 (29.42%)
[06/13 12:58:37] K-Means anchors ratios: [(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
Avg. IoU between bboxes and their most similar K-Means anchors, no norm. (both ratios and sizes matter): 55.72%
Num. bboxes without similar K-Means anchors (IoU < 0.5): 240788/860001 (28.00%)
[06/13 12:58:37] K-Means anchors have an IoU < 50% with bboxes in 1.43% less cases than the default anchors, you should consider to use them
As you can see, I decided to report num. bboxes without similar anchors instead of num. bboxes with similar anchors, because it is easier to interpret when the anchors have an high coverage.
It makes sense, in the end K-Means is not trying to maximize the coverage, is doing something correlated but not exactly the same, in particular is possible that the default anchors have a lower avg IoU with the bounding boxes but they have an IoU > 0.5 more frequently, instead the avg IoU of the K-Means anchors is probably higher but they sacrifice some bounding boxes for this. At least this is my current intuition.
I think that default > k-means is almost always when both have a very good coverage after removing the effect of the anchors sizes (like in the case that you reported)
For giving you an example of the limitations of K-Means, I have one detector that needs to detect electric poles. In almost all the images, for every poles, heights >> widths, but I have also some cases where the helicopter took the photo from a strange perspective and some pole height << pole width.
In my script you can increase
num_anchors_ratios
but, given that the cases above are rare, this will produce just another anchor with height > width, is a limitation of K-Means.My solution for these cases is to run 1 time the algorithm with
num_anchors_ratios=2 or 3
and then I re-run it withnum_anchors_ratios=1 or 2
(depending if I want end up with 3, 4 or 5 anchors ratios) on what in the tutorial I callinstances_without_similar_anchors
. This produce good results in my case and I think is a reasonable way to increase coverage.I'll add the comparison with the default anchors in the output of the script, thanks! :)
Quick update:
I did what @zylo117 suggested, now the output is something like this:
[06/13 12:57:38] Reading ./annotations/instances_train2017.json [06/13 12:57:54] Starting the calculation of the optimal anchors ratios [06/13 12:57:54] Extracting and preprocessing bounding boxes [06/13 12:57:57] Discarding 2 bounding boxes with size lower or equal to 0 [06/13 12:57:57] K-Means (3 runs): 100%|██████████████████| 3/3 [00:33<00:00, 11.06s/it] Runs avg. IoU: 80.48% ± 0.00% (mean ± std. dev. of 3 runs, 0 skipped) Avg. IoU between bboxes and their most similar anchors after norm. them to make their area equal (only ratios matter): 80.48% [06/13 12:58:33] Default anchors ratios: [(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)] Avg. IoU between bboxes and their most similar default anchors, no norm. (both ratios and sizes matter): 55.16% Num. bboxes without similar default anchors (IoU < 0.5): 253049/860001 (29.42%) [06/13 12:58:37] K-Means anchors ratios: [(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)] Avg. IoU between bboxes and their most similar K-Means anchors, no norm. (both ratios and sizes matter): 55.72% Num. bboxes without similar K-Means anchors (IoU < 0.5): 240788/860001 (28.00%) [06/13 12:58:37] K-Means anchors have an IoU < 50% with bboxes in 1.43% less cases than the default anchors, you should consider to use them
As you can see, I decided to report num. bboxes without similar anchors instead of num. bboxes with similar anchors, because it is easier to interpret when the anchors have an high coverage.
@mnslarcher and @zylo117
I don't think this comparison should be suggested, as the result of kmean is not stable and highly depends on distribution of input data.
To prove this, run multiple experiment multiple times, the avg iou may not be the same. And thus it is highly possible that your may see difference of AP flips from -1 to 1. Totally invalid conclusion.
So I do not think there are any ways to solve this, even with the proposed solution discussed above. Let me know your thoughts. If you ask me, I will take the one differ from default. Even the performance drops, given the difference is not too much. As bbox distribution in customized dataset must have some difference compared with default.
It makes sense, in the end K-Means is not trying to maximize the coverage, is doing something correlated but not exactly the same, in particular is possible that the default anchors have a lower avg IoU with the bounding boxes but they have an IoU > 0.5 more frequently, instead the avg IoU of the K-Means anchors is probably higher but they sacrifice some bounding boxes for this. At least this is my current intuition.
I think that default > k-means is almost always when both have a very good coverage after removing the effect of the anchors sizes (like in the case that you reported)
For giving you an example of the limitations of K-Means, I have one detector that needs to detect electric poles. In almost all the images, for every poles, heights >> widths, but I have also some cases where the helicopter took the photo from a strange perspective and some pole height << pole width.
In my script you can increase
num_anchors_ratios
but, given that the cases above are rare, this will produce just another anchor with height > width, is a limitation of K-Means.My solution for these cases is to run 1 time the algorithm with
num_anchors_ratios=2 or 3
and then I re-run it withnum_anchors_ratios=1 or 2
(depending if I want end up with 3, 4 or 5 anchors ratios) on what in the tutorial I callinstances_without_similar_anchors
. This produce good results in my case and I think is a reasonable way to increase coverage.I'll add the comparison with the default anchors in the output of the script, thanks! :)
It's still a trade off here (number of anchors VS AP) and thus not the optimal. The better solution is to jump out of box.
To prove this, run multiple experiment multiple times, the avg iou may not be the same. And thus it is highly possible that your may see difference of AP flips from -1 to 1. Totally invalid conclusion.
I don't agree. In my implementation I have a num_runs
parameter for this, in most of the cases running K-Means many times produce exactly the same result (at least on fairly large datasets). This because we have just two dimensions and, for reasonable datasets, a lot of samples. In any case is not something that one have to theorize, is enough to use an high value for num_runs
and look at the avg. IoU std. dev..
If you ask me, I will take the one differ from default. Even the performance drops, given the difference is not too much. As bbox distribution in customized dataset must have some difference compared with default.
This doesn't make sense to me. Is true that in most of the cases the default anchors are not the best ones but there is no logical reason to think that some other boxes are better just because they are different (especially if you have a metric that is telling you the opposite). Following this logic using default anchors + random noise would be better than using default anchors.
There is a proper way to measure how good the anchors are, you can use a validation set (in this case I don't think we never over-fit the training set so also an evaluation on the training set is OK).
It's still a trade off here (number of anchors VS AP) and thus not the optimal. The better solution is to jump out of box.
I don't understand what do you mean with "jump out of box". In any case I agree, is not optimal, is just an heuristic that I use to balance the limitation of K-Means. This is an example where this heuristic works:
[3, 10] 1000 [4, 10] 1000 [5, 10] 1000 [10, 3] 100
In this case, with num_anchors_ratios=3
, K-means would find 3 anchors ratios with height > width because the centroids will be driven by the shapes with height > width. This is not optimal if the objective is to maximize coverage. To avid this, one can just run K-Means with num_anchors_ratios=1 or 2
and the result will be 1 or 2 anchors that alone cover all the boxes with the exception of the 100 [10, 3]. At this point one can run again K-Means on the residuals (that are only the [10, 3] boxes) and the results will be [10, 3], the set of anchors at this point have an IoU > 50% with all the boxes. This is obviously a toy example just to clarify the idea.
@mnslarcher Are we on the same page? I just point out my thoughts that the difference added to decide which is more useful may not lead to correct conclusion.
I don't agree. In my implementation I have a
num_runs
parameter for this, in most of the cases running K-Means many times produce exactly the same result (at least on fairly large datasets). This because we have just two dimensions and, for reasonable datasets, a lot of samples. In any case is not something that one have to theorize, is enough to use an high value fornum_runs
and look at the avg. IoU std. dev..
Interesting to see you get exactly same result. If this holds, I admit that the decision holds. But Kmeans is not a stable algorithm, is it? Is initialization stable? How could you produce same result? By doing what? I did not see in any part you actually use seed, any other way you have done?
This doesn't make sense to me. Is true that in most of the cases the default anchors are not the best ones but there is no logical reason to think that some other boxes are better just because they are different (especially if you have a metric that is telling you the opposite). Following this logic using default anchors + random noise would be better than using default anchors.
Let me ask you a question, for a given customized dataset, I run your algorithm and get same results as compared with coco default[(1,1), (1.4,0.7), (0.7,1.4)], while I conduct EDA and find the average ratio<1 , If I get kmeans result [(1,1), (0.7,1.4)] and find it lower from default, which should I choose?
Noise can present anywhere in this world. Running kmeans cannot kick all of them out.
There is a proper way to measure how good the anchors are, you can use a validation set (in this case I don't think we never over-fit the training set so also an evaluation on the training set is OK).
It's still a trade off here (number of anchors VS AP) and thus not the optimal. The better solution is to jump out of box.
I don't understand what do you mean with "jump out of box". In any case I agree, is not optimal, is just an heuristic that I use to balance the limitation of K-Means. This is an example where this heuristic works:
[3, 10] 1000 [4, 10] 1000 [5, 10] 1000 [10, 3] 100
In this case, with
num_anchors_ratios=3
, K-means would find 3 anchors ratios with height > width because the centroids will be driven by the shapes with height > width. This is not optimal if the objective is to maximize coverage. To avid this, one can just run K-Means withnum_anchors_ratios=1 or 2
and the result will be 1 or 2 anchors that alone cover all the boxes with the exception of the 100 [10, 3]. At this point one can run again K-Means on the residuals (that are only the [10, 3] boxes) and the results will be [10, 3], the set of anchors at this point have an IoU > 50% with all the boxes. This is obviously a toy example just to clarify the idea.
Validation is a way. Though not effective to me. What you state is a solution. However the memory may not able to hold so much anchors.
@mnslarcher Are we on the same page? I just point out my thoughts that the difference added to decide which is more useful may not lead to correct conclusion.
I don't agree. In my implementation I have a
num_runs
parameter for this, in most of the cases running K-Means many times produce exactly the same result (at least on fairly large datasets). This because we have just two dimensions and, for reasonable datasets, a lot of samples. In any case is not something that one have to theorize, is enough to use an high value fornum_runs
and look at the avg. IoU std. dev..Interesting to see you get exactly same result. If this holds, I admit that the decision holds. But Kmeans is not a stable algorithm, is it? Is initialization stable? How could you produce same result? By doing what? I did not see in any part you actually use seed, any other way you have done?
This doesn't make sense to me. Is true that in most of the cases the default anchors are not the best ones but there is no logical reason to think that some other boxes are better just because they are different (especially if you have a metric that is telling you the opposite). Following this logic using default anchors + random noise would be better than using default anchors.
Let me ask you a question, for a given customized dataset, I run your algorithm and get same results as compared with coco default[(1,1), (1.4,0.7), (0.7,1.4)], while I conduct EDA and find the average ratio<1 , If I get kmeans result [(1,1), (0.7,1.4)] and find it lower from default, which should I choose?
Noise can present anywhere in this world. Running kmeans cannot kick all of them out.
There is a proper way to measure how good the anchors are, you can use a validation set (in this case I don't think we never over-fit the training set so also an evaluation on the training set is OK).
It's still a trade off here (number of anchors VS AP) and thus not the optimal. The better solution is to jump out of box.
I don't understand what do you mean with "jump out of box". In any case I agree, is not optimal, is just an heuristic that I use to balance the limitation of K-Means. This is an example where this heuristic works:
[3, 10] * 1000
[4, 10] * 1000
[5, 10] * 1000
[10, 3] * 100
In this case, with
num_anchors_ratios=3
, K-means would find 3 anchors ratios with height > width because the centroids will be driven by the shapes with height > width. This is not optimal if the objective is to maximize coverage. To avid this, one can just run K-Means withnum_anchors_ratios=1 or 2
and the result will be 1 or 2 anchors that alone cover all the boxes with the exception of the 100 [10, 3]. At this point one can run again K-Means on the residuals (that are only the [10, 3] boxes) and the results will be [10, 3], the set of anchors at this point have an IoU > 50% with all the boxes. This is obviously a toy example just to clarify the idea.Validation is a way. Though not effective to me.
What you state is a solution. However the memory may not able to hold so much anchors.
Sorry @Cli98 , I’m not sure I’m following you 100%, in any case just to clarify:
I’m not sure if I answer to you given that I’m not sure if I understood your point
@mnslarcher Are we on the same page? I just point out my thoughts that the difference added to decide which is more useful may not lead to correct conclusion.
I don't agree. In my implementation I have a
num_runs
parameter for this, in most of the cases running K-Means many times produce exactly the same result (at least on fairly large datasets). This because we have just two dimensions and, for reasonable datasets, a lot of samples. In any case is not something that one have to theorize, is enough to use an high value fornum_runs
and look at the avg. IoU std. dev..Interesting to see you get exactly same result. If this holds, I admit that the decision holds. But Kmeans is not a stable algorithm, is it? Is initialization stable? How could you produce same result? By doing what? I did not see in any part you actually use seed, any other way you have done?
This doesn't make sense to me. Is true that in most of the cases the default anchors are not the best ones but there is no logical reason to think that some other boxes are better just because they are different (especially if you have a metric that is telling you the opposite). Following this logic using default anchors + random noise would be better than using default anchors.
Let me ask you a question, for a given customized dataset, I run your algorithm and get same results as compared with coco default[(1,1), (1.4,0.7), (0.7,1.4)], while I conduct EDA and find the average ratio<1 , If I get kmeans result [(1,1), (0.7,1.4)] and find it lower from default, which should I choose? Noise can present anywhere in this world. Running kmeans cannot kick all of them out.
There is a proper way to measure how good the anchors are, you can use a validation set (in this case I don't think we never over-fit the training set so also an evaluation on the training set is OK).
It's still a trade off here (number of anchors VS AP) and thus not the optimal. The better solution is to jump out of box.
I don't understand what do you mean with "jump out of box". In any case I agree, is not optimal, is just an heuristic that I use to balance the limitation of K-Means. This is an example where this heuristic works:
[3, 10] * 1000
[4, 10] * 1000
[5, 10] * 1000
[10, 3] * 100
In this case, with
num_anchors_ratios=3
, K-means would find 3 anchors ratios with height > width because the centroids will be driven by the shapes with height > width. This is not optimal if the objective is to maximize coverage. To avid this, one can just run K-Means withnum_anchors_ratios=1 or 2
and the result will be 1 or 2 anchors that alone cover all the boxes with the exception of the 100 [10, 3]. At this point one can run again K-Means on the residuals (that are only the [10, 3] boxes) and the results will be [10, 3], the set of anchors at this point have an IoU > 50% with all the boxes. This is obviously a toy example just to clarify the idea.Validation is a way. Though not effective to me. What you state is a solution. However the memory may not able to hold so much anchors.
Sorry @Cli98 , I’m not sure I’m following you 100%, in any case just to clarify:
- There is the possibility to set a random seed in my K-Means implementation (you can look at the code some messages above) but for obvious reasons I don’t use it when I run K-Means multiple time (I run K-Means multiple times exactly to try different initializations and to keep the best result of them)
- What I’m saying is that, even with different initializations, in the special case of using K-Means for this problem where we have just 2 dimensions, usually many data points and well defined clusters, K-Means tends to find always the same solution. I’m not saying this is always true for K-Means, it depend on the type of problem (dataset) where you apply K-Means. This obviously is not a rigorous demonstrations, I’m reporting my empirical evidence resulted from running K-Means many times with different initializations and some different datasets
- I don’t understand 100% the rest of your message but strictly speaking I’m just saying that one time you choose a metric that you want to maximize (like number of boxes with IoU > 50%) and you find a solution that maximize it, you should stick with it, independently on how you found it. So, if default values have an higher score for your metric, if you choose your metric well, you should stick with the default values
I’m not sure if I answer to you given that I’m not sure if I understood your point
@mnslarcher So here is my point. I am not saying your kmean algorithm is incorrect. I'm just wondering if use the difference to measure the result of kmean really works. Since the result is not stable, such comparison may not really make sense. Just my thought.
So I guess your point is, such randomness may not change conclusion as it is to some degree stable. This may not always hold from my perspective, actually depends on difficulty of dataset.And we agree each other on this. That's enough I guess.
"I’m reporting my empirical evidence resulted from running K-Means many times with different initializations and some different datasets".
I'm glad to see any reports in the near future. And this follows your concern #3. My points are, since you judge the result of kmeans from AP, you may see difference work. But how effective to judge by AP remains controversial to me. Let's see if your experiments can justify this.
Hi,
If someone want to give it a try, like https://github.com/Cli98/anchor_computation_tool, I also adapted the code of https://github.com/zhouyuangan/K-Means-Anchors for obtaining anchors ratios using K-Means.
The main difference is that my implementation is faster (less for loops), it takes as input a .json file (not a xml) and has some additional settings (but the Changlin Li repo has more visualization).
https://github.com/mnslarcher/kmeans-anchors-ratios
If you try my code on COCO you will obtain anchors ratios in line with this repo (but for your data set they can be different and this is why it's useful).
Any feedback is appreciated.
UPDATE: I added the commands to get the optimal anchors ratios for COCO, the returned ratios are
[(0.6, 1.5), (1.0, 1.0), (1.4, 0.7)]
, very near to[(0.7, 1.4), (1.0, 1.0), (1.4, 0.7)]
(default values of this repo).