AlbertoSabater commented 5 years ago

Now, the model is training with images loaded with PIL but in the video prediction, the images are loaded with CV2, then loaded into PIL, then predict from the image and later convert the result to CV2 again to be shown in the screen. However, PIL and CV2 use different color spaces. RGB and GBR respectively. So, the model is predicting results from a color space that hasn't be trained with.

gouthamvgk commented 5 years ago

No, the reason for using cv2 during video prediction is to create the video playback file. All the necessary conversions from cv2 to PIL takes place(i.e height and width dimensions are changed)

AlbertoSabater commented 5 years ago

Is there a color space conversion? From BGR (cv2) to RBG (PIL). I see that the frame is loaded with cv2, then converted to an array to be loaded by PIL, but it's still in BGR color space when the prediction takes place.

gouthamvgk commented 5 years ago

You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error

AlbertoSabater commented 5 years ago

From my experience, the learned model is robust enough to generalize to the BGR color space (thanks to the data augmentation) but the final performance still improves by fixing this bug.

fourth-archive commented 5 years ago

@AlbertoSabater @gouthamvgk this YOLOv3 tutorial may help you: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data

The accompanying repository works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights (60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :) https://github.com/ultralytics/yolov3

gouthamvgk commented 5 years ago

Yea I already saw that repo...i wanted a tf or keras implementation so went for this repo but it seems to have a lot of bugs, especially in loss calculation

AlbertoSabater commented 5 years ago

Same here, I chose this repo because of tensorflow. Is there a bug in the loss calculation? Is there any way to fix it?

tangyeqiu commented 5 years ago

Hey buddy:

https://github.com/qqwweee/keras-yolo3/issues/366#issue-419835336

I raised same issue and here is the convert way:

in the while loop:

    return_value, frame = vid.read()
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    image = Image.fromarray(frame)
    image = yolo.detect_image(image)
    result = np.asarray(image)
    result = cv2.cvtColor(result, cv2.COLOR_RGB2BGR)

As an exchange, please provide the issues about the loss calculation. Many thanks.

tangyeqiu commented 5 years ago

You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error

Dude, this bug affects. When I use this repo to detect the traffic light, the color space problem will affect the recognition. For example, if there is a red light, this bug will make it blue or purple-like, and then it will not be recognized as a traffic light coz there is no blue traffic light.

But I didn't meet any problems in the loss calculation, could you please give me some info about it?

AlbertoSabater commented 5 years ago

As far as I know, there isn't any bug in the loss calculation. The NN is trained properly. You just need to get predictions from the same colorspace that the NN has been trained with.

gouthamvgk commented 5 years ago

You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error

Dude, this bug affects. When I use this repo to detect the traffic light, the color space problem will affect the recognition. For example, if there is a red light, this bug will make it blue or purple-like, and then it will not be recognized as a traffic light coz there is no blue traffic light.

But I didn't meet any problems in the loss calculation, could you please give me some info about it?

Since your problem is much dependent on the colorspace it might have a greater effect. For standard datasets I think the model is robust enough

gouthamvgk commented 5 years ago

As far as I know, there isn't any bug in the loss calculation. The NN is trained properly. You just need to get predictions from the same colorspace that the NN has been trained with.

If you see the xy coordinate loss it is calculated as binary crossentropy, while it is a continuous value...i think mean squared loss is a better one...i trained it on pascal and changing the loss from binary cross entropy to mean squared loss improved the mAP very much!

AlbertoSabater commented 5 years ago

Can you provide the code to fix the loss?

gouthamvgk commented 5 years ago

Can you provide the code to fix the loss?

xy_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_xy-raw_pred[...,0:2])
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
confidence_loss = (4.546*object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True))+ ((1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask)
class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)

4.546 in foreground part of confidence loss is a hyperparameter that you can tune according to dataset

AlbertoSabater commented 5 years ago

Thanks! How can you tune that hyperparameter?

gouthamvgk commented 5 years ago

Thanks! How can you tune that hyperparameter?

It is the value by which foreground loss is scaled so that it is not diminished by the background loss...As the number of objects detected as foreground is very less compared to the background we scale it manually to higher value....the tuning depends on your dataset and type of classes

AlbertoSabater commented 5 years ago

Hi! I've modified the loss function and I've got a worst mAP. This is the set of losses I got during training.

Then I scaled the class loss, obj_loss and boobj_loss to match the scale of the xy_loss and wh_loss but I the final mAP didn't improve. This is the set of losses I got after this second training:

Any idea about how to tune these parameters?

gouthamvgk commented 5 years ago

Hi! I've modified the loss function and I've got a worst mAP. This is the set of losses I got during training.

Then I scaled the class loss, obj_loss and boobj_loss to match the scale of the xy_loss and wh_loss but I the final mAP didn't improve. This is the set of losses I got after this second training:

Any idea about how to tune these parameters?

xy loss usually takes the greater magnitude but here it is almost null...what kind of dataset are you training?...

AlbertoSabater commented 5 years ago

xy loss usually takes the greater magnitude but here it is almost null...what kind of dataset are you training?...

I'm working with this dataset.

qqwweee / keras-yolo3

Bug in video prediction? #399

xy loss usually takes the greater magnitude but here it is almost null...what kind of dataset are you training?...

xy loss usually takes the greater magnitude but here it is almost null...what kind of dataset are you training?...