orrzohar / PROB

[CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection
Apache License 2.0
112 stars 16 forks source link

Questions about the paper #28

Closed Rzx520 closed 1 year ago

Rzx520 commented 1 year ago

The multiplication mentioned in the paper in "For class prediction, the learned objectness probability multiplies the classification probabilities to produce the final class predictions" is reflected in the code, but I'm sorry I couldn't find it

Rzx520 commented 1 year ago

I have another question, which is what basis do you use to determine unknown targets? Is it determined by a threshold?@orrzohar

orrzohar commented 1 year ago

Hi @Rzx520,

  1. The multiplication of the objectness with the object class probabilities is right here: https://github.com/orrzohar/PROB/blob/10b6518f90495e07b7baf0d1bfa353f0e583eb8e/models/prob_deformable_detr.py#L532

  2. If you mean unknown object predictions - No threshold is applied, I pick the top-k (k=100) most confident predictions (known+uknown) per image: https://github.com/orrzohar/PROB/blob/10b6518f90495e07b7baf0d1bfa353f0e583eb8e/models/prob_deformable_detr.py#L534

This is in-line with previous/current works, and actually originates from D-DETR/DETR, and is common in cases where one wants to evaluate recall (e.g., Recall@100/10 predictions per image, see class-agnostic OD papers like LDET).

Hope this helps! Orr

Rzx520 commented 1 year ago

Will 100 confident predictions contain a lot of background?@orrzohar

Rzx520 commented 1 year ago

I may have misunderstood your meaning. When you choose top k (k=100), will there be 100 predicted results because it contains 100 known+unknown, but I don't see 100 predicted results in your visualization results? How do you determine which is the background, which is unknown, and which is known?@orrzohar thanks

orrzohar commented 1 year ago

Hi @Rzx520,

The background should be suppressed as the obj_prob for background should be quite low, and therefore when selecting the top-k predictions, the predictions would favor known/unknown objects.

Indeed the model makes 100 predictions per image, but I did not use all 100 for the figures. For a description of how I created Figure 3, please look at issue https://github.com/orrzohar/PROB/issues/11.

If you want to use PROB in a more realistic inference scenario, I would threshold (the known and unknown objects separately) to get more reasonable results, perhaps with NMS.

Let me know if you have any additional questions, Orr

Rzx520 commented 1 year ago

I would also like to know how you determined that this is background and whether there is a specifi obi_prob value is used to determine whether a value below this value is the background?@orrzohar

Rzx520 commented 1 year ago

I don't quite understand the sentence 'cycled through the GT (both known and unknown) objects, and if a model had a prediction of the same class/IoU>0.5, I added that bbox on the image'. Can you explain it in detail?Do unknown objects also have GT? How was the GT of an unknown object obtained?@orrzohar

orrzohar commented 1 year ago

hi @Rzx520,

obj_prob is the objectness and is low for background and high for objects: image

There is no hard threshold on the obj values. There is an implicit one. PROB can make 100*(num_classes+1) predictions per image. When selecting the top-100, the confidence of the 101st prediction can be thought of as a threshold. However, I prefer to conceptualize it that you down weight predictions that are likely background and that way don't predict background as a known/unknown object.

Re GT Unknown annotations - yes, there are! That is how U-Recall is calculated. I visualized it in this way because of the way mAP is calculated, where you first sort the predictions based on confidence and then see if they have a high enough IoU to a GT object. For more on how mAP is calculated, please see this.. So, in order for the qualitative and quantitative results to match up well, they should be created in a simular fashion.

The `unknown' object annotations are those same objects that are hidden from the model at Task t. For example, in M-OWODB, COCO is separated into 4 subsets, where in each task, an additional 20 classes are introduced. So in T1, you have 20 known classes in training and 60 unknown objects during evaluation. In T2, you have 40 classes in the ft dataset, 40 unknown objects in evaluation, and so forth.

Does this make more sense now? Orr

Rzx520 commented 1 year ago

Thank you very much for your reply. If I understand correctly, your unknown GT is in the COCO dataset, except for the trained class (n), the other (80-n) are its unknown GT, right?@orrzohar

Rzx520 commented 1 year ago

Can I understand it this way? It is equivalent to a generalization task, which involves learning the objectness of known class objects to achieve the ability to detect object(known+unknown), and then detect unknown and known objects. I have another question, why did you choose 100 predictions as the prediction result (known+unknown), so it won't lead to too much detection waste? After all, it's rare to detect 100 objects (known+unknown), is it just because DETR gives 100 predictions?Thanks @orrzohar

Rzx520 commented 1 year ago

'AuntimeError: Timed out initializn process group in store based barrier on rank: 2,for key: store based barier key:l. (world size-3. worker counte,timeot=0:30:00)' After training one epoch, training the second epoch reported an error. As mentioned above, do you know how to solve it?@orrzohar

orrzohar commented 1 year ago

Hi @Rzx520,

  1. Yes you are exactly right! The remaining classes constitute the ‘unknown’ objects (this is the case for ALL the OWOD works - as this is the only way to determine what predictions even contain objects in them)
  2. A. That is definitely a way to think about it — we want to learn from the base classes a general notion of ‘objectness’ to be able to detect novel objects.
  3. B. I chose 100 predictions to remain consistent stistent with prior OWOD work — If I were to increase the number of predictions per image, I could artificially increase U-Recall without actually improving the model predictions, just by making more predictions. As OW-DETR and prior works all used 100 predictions/image, selecting 100 predictions per image was a way of mitigating this.
  4. C. Actually, D-DETR can make more then 100 predictions per image — it can make 100*number_of_classes predictions per image (because it can predict the same bbox having different classes).
  5. Re the Runtime error - The error message you provided suggests that there was a timeout error while initializing the process group in a distributed training setup. This error occurred on rank 2 with a specific key related to a store-based barrier. PROB does not train in 30min, so I would increase the runtime timeout threshold first.

Also, would you mind opening a separate issue for the runtime? That way it would be easier to find for future users.

Best, Orr