I modified your code to use 2 channels depth-thermal image.
I have 800 training data of different body positions.
But after a decent training, the predicted bbox are the same, no matter the body position in the picture. Exact same coordinates, exact same scores. (I use top_k=1 since I have only 1 object per image)
Same with top_k = 200 :
I do not use data augmentation, but I think this is another problem.
It's like the loss was trapped in a local minimum, but idk.
Did you have the same problem ?
EDIT: I figure out my predictions are based on the number of each body position images :
left, dataset : 56 standing, 53 sitting, 73 lying down
right, dataset : 56 standing, 108 sitting, 73 lying down
this is just a small dataset to perform tests... but I dont understand why the loss is stucked here...
I modified your code to use 2 channels depth-thermal image. I have 800 training data of different body positions. But after a decent training, the predicted bbox are the same, no matter the body position in the picture. Exact same coordinates, exact same scores. (I use top_k=1 since I have only 1 object per image) Same with top_k = 200 :
I do not use data augmentation, but I think this is another problem. It's like the loss was trapped in a local minimum, but idk. Did you have the same problem ?
EDIT: I figure out my predictions are based on the number of each body position images : left, dataset : 56 standing, 53 sitting, 73 lying down right, dataset : 56 standing, 108 sitting, 73 lying down
this is just a small dataset to perform tests... but I dont understand why the loss is stucked here...