Closed FrancescoSaverioZuppichini closed 4 years ago
@FrancescoSaverioZuppichini grey image training should work fine. The training data always needs 4 dimensions though as you see above, so you can not collapse the channel dimension away, you set it as 1 for greyscale.
Also set channels=1 in your cfg so the first convolution kernel is created with the correct dimensions.
@glenn-jocher Amazing! Thank you, I will try it and let you know. What about adding an example with gray images in the doc? Maybe it can be useful for noobs like me :)
Thank you again
@FrancescoSaverioZuppichini is there a publically available greyscale dataset in darknet format we could use?
Actually I don’t know, my dataset is composed by receipts so it makes sense to covert them to gray scale. On 20 Nov 2019, 23:39 +0100, Glenn Jocher notifications@github.com, wrote:
@FrancescoSaverioZuppichinihttps://github.com/FrancescoSaverioZuppichini is there a publically available greyscale dataset in darknet format we could use?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ultralytics/yolov3/issues/625?email_source=notifications&email_token=ADZLZXGQHASQRJN2QRGFEPTQUW4BNA5CNFSM4JOKNXP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEVXUSQ#issuecomment-556497482, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADZLZXCKNQLE7OHLM33SJE3QUW4BNANCNFSM4JOKNXPQ.
Actually let me know if it's possible to edit the document and I will be happy to add a page on grayscale training. I am currently working on that.
@FrancescoSaverioZuppichini @FranciscoReveriano the easiest way to do this I think would be to create a tutorial that copies all of the images in coco_img64.txt to greyscale and then runs through all of the settings for single-channel training for this new coco_img64grey.txt dataset.
@FranciscoReveriano if you just create a new issue called GREYSCALE TRAINING EXAMPLE and put your steps there that should work. You could use https://github.com/ultralytics/yolov3/issues/192 as a template.
Quick question. If I want to train from scratch. I would have to start by training Resnet50 on ImageNet And then training the coco dataset on top right?
@FranciscoReveriano no, you just run train.py to start. See https://github.com/ultralytics/yolov3#reproduce-our-results
I understand that part. But to start from the very beginning. Desde el mero comienzo tenemos que entrenar en ImageNet primero no?
Because when we ran train.py we are starting from a set of pre-trained weights. We then train the coco dataset on top of those correct? If we want to make a pure grayscale model.
porque cuando comienzo le pongo --weights weights/yolov3.weights Entonces las yolov3.weights are trained on color. Si entiendo correctamente estan entrenadas en Imagenet. Y luego las usamos para la Darknet con Coco.
Entonces como uno puede entrenar las yolov3.weights desde el comienzo.
@FranciscoReveriano no requiere ningun weights entrenados para empezar. Empieza con los weights al azar, por eso tiene el command --weights ''
. El command simplemente esta aqui, nada mas.
$ python3 train.py --weights '' --cfg yolov3-spp.cfg --epochs 273 --batch 16 --accum 4 --multi --pre
Muchas gracias hombre! Estoy escribiendo un papel para mi universidad y estoy usando la base de su programa. Hay una citation oh una reference que les gustaria que use?
@FranciscoReveriano claro de nada. Si, puedes usar este: https://github.com/ultralytics/yolov3#citation
Una pregunta mas Glenn. Cuando hacemos --weights '' tu programa no esta usando ninguna pre-trained weight? Pregunto porque el papel original de PJ-Reddy entrenava primero en ImageNet. Y luego en COCO, si lo entiendo corectamente. Muchas gracias por la ayuda. Un gran programa.
@FranciscoReveriano los yolov3 resultados publicados (https://arxiv.org/abs/1804.02767) si usan imagenet-trained backbone, pero aqui no lo usamos para nuestros resultados https://github.com/ultralytics/yolov3#map
Puedes usar un backbone o no:
python3 train.py --weights ''
python3 train.py --weights darknet53.conv.74
python3 train.py --weights ultralytics68.pt
@FrancescoSaverioZuppichini is there a publically available greyscale dataset in darknet format we could use?
Yes, there is. FLIR provide a thermal dataset for object detection in greyscale images. Recently I am working with it. You could find it here. https://www.flir.com/oem/adas/adas-dataset-form There really need some documents on how to train models on greyscale images.
@Sephirot1st your link in broken. In any case, we need labels already in yolo/darknet format.
@glenn-jocher I've just added some code additions on https://github.com/ultralytics/yolov3/issues/563#issuecomment-576343886
Are you able to check my code there? I prefer to have your sanity check before continuing on this topic.
@pieterbl86 a pull request would be best to see the changes.
@pieterbl86 a pull request would be best to see the changes.
@glenn-jocher yes, you are right! please see #802
I would like to take a look. I tried something similar earlier. The problem I had was that I think some of the image augmentations converts the image to 3 channels
If @glenn-jocher give it a thumbs up! i can test it on the COCO 1 Channel Dataset that I have ready.
I would like to take a look. I tried something similar earlier. The problem I had was that I think some of the image augmentations converts the image to 3 channels
If @glenn-jocher give it a thumbs up! i can test it on the COCO 1 Channel Dataset that I have ready.
@FranciscoReveriano great, yes you can try training on 1-channel COCO. The code in #802 worked on my computer with 1-channel custom dataset (1 class).
I changed the augmentation function in datasets.py as well (from rgb->hsv conversion to gray->v conversion, see the function augment_v)
@FranciscoReveriano los yolov3 resultados publicados (https://arxiv.org/abs/1804.02767) si usan imagenet-trained backbone, pero aqui no lo usamos para nuestros resultados https://github.com/ultralytics/yolov3#map
Puedes usar un backbone o no:
python3 train.py --weights '' python3 train.py --weights darknet53.conv.74 python3 train.py --weights ultralytics68.pt
@glenn-jocher , @FranciscoReveriano
as you know I uploaded a grayscale training and inference method in #802 However I found something I cannot get my head around:
I started to train without any pretrained weights (--weights '') and everything worked fine. Then I started to train with the ultralytics68.pt pretrained weights (--weights ultralytics68.pt). Somehow the training just started and produced similar (marginally higher) performance as the option without transfer learning.
My question: how is it possible that an altered network (with 1 instead of 3 channels) is able to start training with pretrained weights that have 3 channels in the first conv-layer? I would expect the code to raise a tensor-mismatch related error.
Is there some catch code that if the tensor does not match, it then automatically start training with random weights? Or does the code automatically scale the ultralytics pretrained weights (trained on 3 channels) to weights with 1 channel in the first conv-layer? Or does the code random initialize only the first conv-layer while remaining the other pretrained weights on the remaining (deeper) layers?
When I inspect the training curves they still look different (the option without pretrained weights is more spiky), however the best.py classifiers of both training methods still yield similar performance. Is my dataset too easy? I found it strange...
Training curve (on grayscale images) without pretrained weights (--weights ''):
Training curve (on grayscale images) with ultralytics68.pt pretrained weights (--weights ultralytics68.pt):
This is a great question. When I tried doing this I discovered some problems. I think some of the CV functions re-convert the image to 3 channels. I am interested in seeing what the answer might be.
@pieterbl86 @Sephirot1st @FrancescoSaverioZuppichini @FranciscoReveriano after thinking this over a bit, I think the best strategy for grescale adaptation is to retain a 3-channel input shape for all images, regardless of their channel count. The fix would simply be 2 lines of code added in https://github.com/ultralytics/yolov3/blob/dd3cf27ececafc17136cce82c8dd502ce4dae6d0/utils/datasets.py#L515
img = cv2.imread(img_path) # BGR
if img.shape[2] == 1:
img = np.tile(img, 3)
The benefits are:
The drawback is of course a very very slight increase in parameter count (about +600). The only affected layer is layer 0. If I instantiate a 3-channel model vs a 1-channel model today I see these two model summaries for example, so the computational overhead created by the change is insignificant (i.e. << 1%).
layer name gradient parameters shape mu sigma
0 0.Conv2d.weight True 864 [32, 3, 3, 3] -8.67e-05 0.112
1 0.BatchNorm2d.weight True 32 [32] 1 0
2 0.BatchNorm2d.bias True 32 [32] 0 0
3 1.Conv2d.weight True 18432 [64, 32, 3, 3] 0.000242 0.034
4 1.BatchNorm2d.weight True 64 [64] 1 0
5 1.BatchNorm2d.bias True 64 [64] 0 0
...
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
0 0.Conv2d.weight True 288 [32, 1, 3, 3] -0.0153 0.191
1 0.BatchNorm2d.weight True 32 [32] 1 0
2 0.BatchNorm2d.bias True 32 [32] 0 0
3 1.Conv2d.weight True 18432 [64, 32, 3, 3] 0.000264 0.034
4 1.BatchNorm2d.weight True 64 [64] 1 0
5 1.BatchNorm2d.bias True 64 [64] 0 0
...
Model Summary: 225 layers, 6.29982e+07 parameters, 6.29982e+07 gradients
Any thoughts?
All, after checking around a bit, it seems that cv2 is loading greyscale images as 3-channel by default arleady: https://stackoverflow.com/questions/18870603/in-opencv-python-why-am-i-getting-3-channel-images-from-a-grayscale-image
So it would appear that the current repo is already capable of greyscale training and inference without changes, unless I am missing something?
@pieterbl86 when you load --weights
every layer with matching names and shapes is loaded, so in your case if you modified the model to input 1 channel images, you are really only modifying layer 0, and thus every layer except layer 0 is populated with ultralytics68.pt pretrained weights, which is why your results appear better behaved.
Yes. OpenCV automatically sets up all single-channel images into three channels. It uses the gray2color method somewhere in the implementation of the functions we are running. I think the idea is to convert the whole network into a single channel network. To avoid using the gray2color function somewhere.
@FranciscoReveriano @pieterbl86 @Sephirot1st ah well it seems like this issue is much about nothing then.
If cv2 is already reading in greyscale images as 3-channel images, then the repo works by default with no changes at all on greyscale datasets (for all functions: training, testing and inference), color datasets, and mixed greyscale and color datasets no?
Am I missing something here? If you guys are concerned about the extra computation overhead involved with the 2 extra channels, it's completely insignificant, 600 extra parameters out of 60Million, or 6.29987e+07 vs 6.29982e+07.
@glenn-jocher
Yes, you are completely right! No, I am/was not concerned about the computional overhead. Just about the correctness of the approach. It would make most sense to leave it like this (as transfer-learning can be fully utilized).
The reason why I worked on this grayscale use-case, is that I want to learn how to "scale-up" to images with more than 3 channels (RGB-D, RGB-XYZ or multi/hyperspectral images).
I'm currently working on that, and it seems that it works. One thing is that I loose the first conv-layer transfer-learned weights as I have to randomly initialize these weights. @glenn-jocher do you have any tips / work arounds for that?
@pieterbl86 ah yes, abstracting to the n-channel image case may be more difficult, though this is pretty specialized work that may not warrant changes the baseline repo.
Yes if you start from a different set of weights like ultralytics68.pt you will not be able to transfer the first layer, and probably the output layers, but again that is only 864 parameters in the first layer out of 1E7, as shown in https://github.com/ultralytics/yolov3/issues/625#issuecomment-578907572 so it will have a very mild effect.
Starting from pretrained weights is another topic altogether though. There's evidence both for an against it, and it is not beneficial to do this in all cases. For example all of the ultralytics COCO mAPs reported in https://github.com/ultralytics/yolov3#map are trained from randomly initialized weights, which outperformed models trained from the darknet53.conv.74 backbone. In general starting from a backbone gives you good results quickly, but training from scratch gives you better results after many more epochs.
@pieterbl86 ah yes, abstracting to the n-channel image case may be more difficult, though this is pretty specialized work that may not warrant changes the baseline repo.
Yes if you start from a different set of weights like ultralytics68.pt you will not be able to transfer the first layer, and probably the output layers, but again that is only 864 parameters in the first layer out of 1E7, as shown in #625 (comment) so it will have a very mild effect.
Starting from pretrained weights is another topic altogether though. There's evidence both for an against it, and it is not beneficial to do this in all cases. For example all of the ultralytics COCO mAPs reported in https://github.com/ultralytics/yolov3#map are trained from randomly initialized weights, which outperformed models trained from the darknet53.conv.74 backbone. In general starting from a backbone gives you good results quickly, but training from scratch gives you better results after many more epochs.
@glenn-jocher thanks for your answer and support!
Yes, a multi-channel network (>3 channels) is not really something we should want to push here... I'm just going to build that on my local pc.
I find it an interesting observation that training from scratch had better performance than with the weights from the darknet backbone. When training on other networks (like Mask-RCNN), I always ran into overfitting when adding more epochs. Don't you think it has to do with the number of images? COCO is a very big datatset that can be trained from scratch properly as there are so many examples (especially when training in batches).
Till now my maximum number of images in a train set was 1000, which is maybe too small to properly train without transfer-learning. Or do you think that training with more epochs will yield a better result even on small datasets?
@pieterbl86 ah that's a good question. In my experience training COCO from scratch and COCO from pretrained weights they both converge to roughly the same point given enough time, though training from scratch typically prevents overtraining for longer.
On smaller datasets though you are right the best tactics might be different. When we train small datasets we typically do starts with the best COCO trained weights we have, ultralytics68.pt. This leads to the best mAPs, but also begins to overtrain very quickly, sometimes as early as 10 epochs, always in the obj and cls losses, but never in the regression loss.
Una pregunta. Cual es la diferencia de "ultralytics68.pt" y "yolov3-spp.weights" y "ultralytics49.pt". Mire que tienen diferente weights. Y me gustaria entender cual es la diferencia de cada una. Gracias por la ayuda @glenn-jocher
@FranciscoReveriano ultralytics68.pt es entrenado por este repo (version 68), y esta en formato pytorch .pt, yolov3-spp.weights es entrenado por darknet original, y es en formato .weights darknet.
ultralytics49.pt es otro model entrenado por este repo (version 49), y no deberia ser usado, por que tiene peor mAP que ultralytics68.pt.
De todos los modelos, ultralytics68.pt tiene el mAP mas alto. La comparacion esta en https://github.com/ultralytics/yolov3#map
Todos usan el miso yolov3-spp.cfg.
Ah chevere. Pero si estoy leyendo y entiendo todo bien. Las weights ultralytics68.pt son entrenadas por una image-size 608 correctamente? Solo tratando de entender como se comparan con las weights en COCO que yo entrene.
@FranciscoReveriano para entrenar ultralytics68.pt uso este command, pero usa un --multi-scale especial, de 288-640
python3 train.py --weights '' --epochs 273 --batch 16 --accum 4 --nosave --multi
Hay un docker image que contiene todo exacto para repetir el entrenamiento. Puedes usar este command para bajar el docker image y empezar el entrenamiento, que terminara con los mismos weights (mas o menos) que ultralytics68.pt, aunque cuidado que tarda bastante tiempo (4-5 dias usando un V100, y mas tiempo en GPUs mas lentos).
n=59 && t=ultralytics/coco:v$n && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 416 --epochs 273 --batch 16 --accum 4 --weights '' --device 0 --cfg yolov3-spp.cfg --nosave --multi
Jajaja Tienes razon. Si dura bastante tiempo. En mi RTX 2080 TI me dura casi seis dias. La voy ah correr esta semana. La medida de 640. La version del codigo hoy. Si quieres estuviera feliz mandarte los resultados. Cuando termine ese modelo.
@FranciscoReveriano si vas a entrenar uno de nuevo, deberias hacer dos cambios primero. No los e committed todavia porque estoy esperando resultados pero parece que dos cambios que hice recientemente han empeorado los nuevos resultados un poco. Los cambios del current repo que deberias hacer es usar red='mean'
aqui:
https://github.com/ultralytics/yolov3/blob/11bcd0f9885ce548c7c123c611921fe63bebe592/utils/utils.py#L372
y usar tobj[b, a, gj, gi] = 1.0
aqui:
https://github.com/ultralytics/yolov3/blob/11bcd0f9885ce548c7c123c611921fe63bebe592/utils/utils.py#L404
Estoy entrenando ahora con los dos cambios, pero va a tarder unos dias mas para ver los resultados. Creo que terminara un poquito arriba de ultralytics68.pt (si tenemos suerte).
@FranciscoReveriano ah, y el tercer ultimo cambio es 'obj': 64.3, # obj loss gain
aqui.
https://github.com/ultralytics/yolov3/blob/11bcd0f9885ce548c7c123c611921fe63bebe592/train.py#L28
Chevere. Voy hacer los cambio. Voy ah entrenar ah 640 porque esa es la medidad original de mi custom dataset. Pero en cinco dias comparto los resultados.
@FranciscoReveriano ok! Esto es el entrenamiento de momento en COCO con los cambios. Estoy entrenando con un 2080Ti tambien, tarda 33 minutos por epoch y 1 minuto mas para obtener el mAP despues de cada epoch. Asi que solo faltan 240 mas aqui jaja :(
Una pregunta? Como decidites hacer el learning rate scheduler? Porque estoy pensando jugar un poco con el esta semana. Para tratar de agarrar una mejor mAP en el primer epoch.
El LR scheduler esta definido originalmente en darknet, creo que aqui https://arxiv.org/abs/1804.02767 o posiblemente en yolov2 paper. Es implemente el LR original multiplicado por 0.1 a 80% y 90% de --epochs.
This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.
Dear all,
Apparently it is not possible to train the model with gray images.
Even if I convert the images in the getitem function, then the code will explode due to the multiple dependencies on the channel. E.g
Cheers,
Francesco Saverio