Open AlbertoSabater opened 5 years ago
No, the reason for using cv2 during video prediction is to create the video playback file. All the necessary conversions from cv2 to PIL takes place(i.e height and width dimensions are changed)
Is there a color space conversion? From BGR (cv2) to RBG (PIL). I see that the frame is loaded with cv2, then converted to an array to be loaded by PIL, but it's still in BGR color space when the prediction takes place.
You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error
From my experience, the learned model is robust enough to generalize to the BGR color space (thanks to the data augmentation) but the final performance still improves by fixing this bug.
@AlbertoSabater @gouthamvgk this YOLOv3 tutorial may help you: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data
The accompanying repository works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights
(60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :)
https://github.com/ultralytics/yolov3
|
---|
Yea I already saw that repo...i wanted a tf or keras implementation so went for this repo but it seems to have a lot of bugs, especially in loss calculation
Same here, I chose this repo because of tensorflow. Is there a bug in the loss calculation? Is there any way to fix it?
Hey buddy:
https://github.com/qqwweee/keras-yolo3/issues/366#issue-419835336
I raised same issue and here is the convert way:
in the while loop:
return_value, frame = vid.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image = Image.fromarray(frame)
image = yolo.detect_image(image)
result = np.asarray(image)
result = cv2.cvtColor(result, cv2.COLOR_RGB2BGR)
As an exchange, please provide the issues about the loss calculation. Many thanks.
You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error
Dude, this bug affects. When I use this repo to detect the traffic light, the color space problem will affect the recognition. For example, if there is a red light, this bug will make it blue or purple-like, and then it will not be recognized as a traffic light coz there is no blue traffic light.
But I didn't meet any problems in the loss calculation, could you please give me some info about it?
As far as I know, there isn't any bug in the loss calculation. The NN is trained properly. You just need to get predictions from the same colorspace that the NN has been trained with.
You are right..the color space conversion doesn't takes place...but I don't know whether it affects prediction...it might....I changed all the PIL to cv2 code and the result which i get is more or less similar to the result with this color space error
Dude, this bug affects. When I use this repo to detect the traffic light, the color space problem will affect the recognition. For example, if there is a red light, this bug will make it blue or purple-like, and then it will not be recognized as a traffic light coz there is no blue traffic light.
But I didn't meet any problems in the loss calculation, could you please give me some info about it?
Since your problem is much dependent on the colorspace it might have a greater effect. For standard datasets I think the model is robust enough
As far as I know, there isn't any bug in the loss calculation. The NN is trained properly. You just need to get predictions from the same colorspace that the NN has been trained with.
If you see the xy coordinate loss it is calculated as binary crossentropy, while it is a continuous value...i think mean squared loss is a better one...i trained it on pascal and changing the loss from binary cross entropy to mean squared loss improved the mAP very much!
Can you provide the code to fix the loss?
Can you provide the code to fix the loss?
xy_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_xy-raw_pred[...,0:2])
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
confidence_loss = (4.546*object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True))+ ((1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask)
class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)
4.546 in foreground part of confidence loss is a hyperparameter that you can tune according to dataset
Thanks! How can you tune that hyperparameter?
Thanks! How can you tune that hyperparameter?
It is the value by which foreground loss is scaled so that it is not diminished by the background loss...As the number of objects detected as foreground is very less compared to the background we scale it manually to higher value....the tuning depends on your dataset and type of classes
Hi! I've modified the loss function and I've got a worst mAP. This is the set of losses I got during training.
Then I scaled the class loss, obj_loss and boobj_loss to match the scale of the xy_loss and wh_loss but I the final mAP didn't improve. This is the set of losses I got after this second training:
Any idea about how to tune these parameters?
Hi! I've modified the loss function and I've got a worst mAP. This is the set of losses I got during training.
Then I scaled the class loss, obj_loss and boobj_loss to match the scale of the xy_loss and wh_loss but I the final mAP didn't improve. This is the set of losses I got after this second training:
Any idea about how to tune these parameters?
xy loss usually takes the greater magnitude but here it is almost null...what kind of dataset are you training?...
I'm working with this dataset.
Now, the model is training with images loaded with PIL but in the video prediction, the images are loaded with CV2, then loaded into PIL, then predict from the image and later convert the result to CV2 again to be shown in the screen. However, PIL and CV2 use different color spaces. RGB and GBR respectively. So, the model is predicting results from a color space that hasn't be trained with.