kaiyuyue / nxtp

Object Recognition as Next Token Prediction (CVPR 2024 Highlight)
https://arxiv.org/abs/2312.02142
Other
160 stars 6 forks source link

why filter out samples that have less than 3 objects? #5

Open zimenglan-sysu-512 opened 3 months ago

zimenglan-sysu-512 commented 3 months ago

hi @kaiyuyue i see the lines in encoding.py, for what reason to miss these samples? for imagenet-1k dataset, many categories are less then 3 class names.

kaiyuyue commented 3 months ago

The code is there but it is not working (see Line 118, the threshold is 0). This is legacy part for training on LAION aming to filter out the noisy labels and make sure each image that are trained with enough diverse labels.

zimenglan-sysu-512 commented 3 months ago

oh you're right. by the weight, why set weight_loss_cap to zero?

kaiyuyue commented 3 months ago

Since the quality of original captions (alt-text crawled from internet) is not good, and using it in the training will hurt the performance. If you wanna try sth. new, we have a new released dataset for this problem, pixelprose, recaptioned by gemini.