Random Failures - Githubissues

tchott commented 1 year ago

Hello!

I have been experiencing this random error with your code. If I run this code multiple times (this was cut from a pytest), it will randomly fail due to the len of keep being 0. Here are my fixed values:

import numpy  as np 
from lsnms import nms  #v0.3.1
nms_test_dict_failure = {
     'scores': np.array([0.03128776, 0.15489164, 0.05489164]) 
     'boxes': np.array([[ 623.47991943,  391.94015503,  675.83850098,  445.0836792 ],
         [  11.48574257,   15.99506855, 1053.84313965, 1074.78381348],
         [  11.48574257,   15.99506855, 1053.84313965, 1074.78381348]])
     'class_labels': np.array([ 1, 27, 23]),
     'score_thresh': 0.1} 

def test_lsnms_failure_scenario(): 
  keep = nms(
              boxes=nms_test_dict_fail["boxes"],
              scores=nms_test_dict_fail["scores"],
              score_threshold=nms_test_dict_fail["score_thresh"],
              class_ids=nms_test_dict_fail["class_labels"],
          )   
  assert len(keep)==1

Is there some randomness to how the tree is generated? Here is why I am confused. For this specific example:

Two of the bboxes are the exact same but are distinct in their class label.
Each of the three class ids are distinct

I am wondering if something is off in your offset_bboxes code that would create boxes where these examples would be overlapping. Regardless, based on the chosen score threshold alone, it seems I should always have a singular detection kept, regardless of the associated bboxes and their associated categories. It would be weird to have a singular detection make it successfully through the scores check of your code on https://github.com/remydubois/lsnms/blob/62139e68fb94af3ef6c743b750ed3086e8472f53/lsnms/nms.py#L23 and for that singular detection to get nms into 0 detections in the end.

Do you have any insights into what I might be failing to consider in this scenario?

Thanks!

Taylor Chott.

tchott commented 1 year ago

I have further isolated this down to being purely dependent on the fact that there are 3 class ids leveraged. In the case that all detections have the same class id, this is NOT an issue and the number of detections returned from nms is correct.

remydubois commented 1 year ago

Hello,

thank you for your issue. The code used to offset the bboxes class-wise is quite arbitrary, and some parameters were chosen ad-hoc (such as https://github.com/remydubois/lsnms/blob/62139e68fb94af3ef6c743b750ed3086e8472f53/lsnms/util.py#L387), maybe you can try tweaking the parameter here (I have no access to my laptop until sunday eow). Let me know

remydubois commented 1 year ago

Hello,

Sorry for my previous answer which was not so on point. I managed to find the bug, even if I fail to completely understand where that randomness comes from (which I witnessed, regardless of the class labels). Basically: when masking the boxes based on their score, I forgot to mask the scores as well. This is messy because right after, I order the boxes by their score, so there is a mismatch between the amount of boxes and the amount of scores to filter out, see the associated PR.

Thank you very much for reporting this issue which unfortunately hid under my radars !

remydubois commented 1 year ago

I took some time to dig into the randomness thing, and I have no explanation of the phenomenon. Basically it seems Numba rather returns random bits when trying to slice arrays out-of-index, (which happened before the fix) when there were more scores than boxes after the score thresholding.

Something as simple as this shows this faulty behaviour.

@njit
def f(x, i):
    return x[i]

x = np.arange(10)

>>> f(x, 10) # Works fine
>>> f(x, 11) # Will throw random garbage, while it should raise IndexError

Without this weird bug, lsnms.nsm would raise IndexError instead of throwing random results. The issue is now fixed on lsnms but probably something I should raise to the numba maintainers.

Edit: this is an expected behaviour of Numba, see bounds checking

tchott commented 1 year ago

Awesome! thanks for digging in Remy! Any idea on when your next release will be?

remydubois commented 1 year ago

I will try to merge and release 0.3.2 in the coming days. Also, I will try and do the same for 0.4.0 which should implement compilation caching and bring significant speedup.

remydubois commented 1 year ago

0.3.2 should fix the issue and is released on pypi.
Closing the issue but feel free to re-open if problem arises again.

remydubois / lsnms

Random Failures #12