Closed alontrais closed 4 years ago
I am having a much similar issue:
File "train.py", line 111, in <dictcomp> chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()} KeyError: 'module_list.85.Conv2d.weight'
I think something is wrong w/ custom .cfg
and/or .data
file. because when I do a sanity check w/ default files I get:
'No labels found. Recommend correcting image and label paths.' AssertionError: No labels found. Recommend correcting image and label paths.
Please see, "Train On Custom Data" - https://github.com/ultralytics/yolov3/issues/621
Did you check the coco.data file? And your .cfg file should have nothing to do with this.
The easiest way to fix this is by making sure that you have a directory called 'labels' inside your data directory. In this directory you place all the labels for both the test/validation. Also make sure that you have the correct path names of your images. I have found relative paths to be better than then full paths.
Nope still broken
Why isn’t there instructions on simply running your own images thru it, while using coco/yolo, and getting some metrics like mAP and false positives and negatives? I can’t believe the docs have made it this hard. I’m willing to rewrite them if I can figure this out.
@alontrais @joehoeller thank you for your interest in our work! Please note that most technical problems are due to:
git clone
version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3 # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py # verify detection
python3 train.py # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
train_batch0.jpg
and test_batch0.jpg
for a sanity check of training and testing data.If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!
@alontrais I had a similar error before, and I figured it out. The cause of this error in my end is because I used yolov3.cfg as my configure, but use the default weight file 'ultralytic49.pt', and the two does not match.
In the case that you want to use the default weight, you can use the yolov3-spp.cfg as a baseline and modify the corresponding filters/num_class as instructed.
@glenn-jocher I followed your instructions:
sudo rm -rf yolov3 # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py # verify detection
python3 train.py # verify training (a few batches only)
I get this when I run train.py
:
line 374, in __init__
assert nf > 0, 'No labels found. Recommend correcting image and label paths.'
AssertionError: No labels found. Recommend correcting image and label paths.
python3 detect.py
works just fine../coco/trainvalno5k.txt
.@joehoeller you need the coco dataset to run the training examples:
$ bash yolov3/data/get_coco_dataset_gdrive.sh
Yes, I already did that. Is there something that needs to be done to the labels, other than putting them in /data folder? For example, should they be in the nested folders in which they came from? As @Fransisco stated above. (The labels were copied so their original path is still intact).
@joehoeller nothing needs to be done to the labels. You just git clone the repo, copy the coco dataset and train. You can even follow the notebook, just click play in each cell.
https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw
I still get the error. Why did you close it?
On Sun, Nov 24, 2019 at 3:45 PM Glenn Jocher notifications@github.com wrote:
@joehoeller https://github.com/joehoeller nothing needs to be done to the labels. You just git clone the repo, copy the coco dataset and train. You can even follow the notebook, just click play in each cell.
https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHEMX4ELZI3NHDSXMOTQVLYX7A5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAVVUQ#issuecomment-557931218, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHG4GM62Y76WBJX7SKTQVLYX7ANCNFSM4JQ3CBSA .
@joehoeller your error is not reproducible, there's no bug. Follow the steps, everything works properly.
That is false sir, because I did. And I get the error for the labels as shown.
On Sun, Nov 24, 2019 at 4:47 PM Glenn Jocher notifications@github.com wrote:
@joehoeller https://github.com/joehoeller your error is not reproducible, there's no bug. Follow the steps, everything works properly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHAS6QZX3AGBPNYGBWDQVL77PA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAW4QQ#issuecomment-557936194, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHHRCTIAHIR3PW5DBFDQVL77PANCNFSM4JQ3CBSA .
@joehoeller To get started simply run the following in a terminal, or open the notebook and click play on the first cells (same code): https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw
rm -rf yolov3 coco coco.zip # WARNING: remove existing
git clone https://github.com/ultralytics/yolov3 # clone
bash yolov3/data/get_coco_dataset_gdrive.sh # copy COCO2014 dataset (19GB)
cd yolov3
python3 train.py
How many times do I have to tell you I did that. I’m moving on to build my own solution — which I can do, I was just hoping to save time.
On Sun, Nov 24, 2019 at 5:21 PM Glenn Jocher notifications@github.com wrote:
@joehoeller https://github.com/joehoeller To get started simply run the following in a terminal, or open the notebook and click play on the first cells (same code): https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw
rm -rf yolov3 coco coco.zip # WARNING: remove existing git clone https://github.com/ultralytics/yolov3 # clone bash yolov3/data/get_coco_dataset_gdrive.sh # copy COCO2014 dataset (19GB) %cd yolov3 python3 train.py
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHFARQIBSGLVIWYU6KLQVMEANA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAXYFQ#issuecomment-557939734, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHBHLDPEZJIFQ65WGADQVMEANANCNFSM4JQ3CBSA .
How many times do I have to tell you I did that. I’m moving on to build my own solution — which I can do, I was just hoping to save time. … On Sun, Nov 24, 2019 at 5:21 PM Glenn Jocher @.***> wrote: @joehoeller https://github.com/joehoeller To get started simply run the following in a terminal, or open the notebook and click play on the first cells (same code): https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw rm -rf yolov3 coco coco.zip # WARNING: remove existing git clone https://github.com/ultralytics/yolov3 # clone bash yolov3/data/get_coco_dataset_gdrive.sh # copy COCO2014 dataset (19GB) %cd yolov3 python3 train.py — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#650?email_source=notifications&email_token=ABHVQHFARQIBSGLVIWYU6KLQVMEANA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAXYFQ#issuecomment-557939734>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHBHLDPEZJIFQ65WGADQVMEANANCNFSM4JQ3CBSA .
Don't be rude! Instead of complaining, you need to embrace the spirit of collaboration. This is the best PyTorch implementation public. Contribute to making it better.
FYI. If you are not on a notebook and you want to run this. I would advise that you follow the setup in that is made by
bash get_coco_dataset.sh
There you will get the perfect structure.
Let me make this more clear for you since you do not understand:
I followed the steps exactly as stated. Then I got the error message about the labels, which still persists.
So no it’s not the best. I’m making my own so I don’t waste any more time. I just thought I could save time using this, and clearly I was wrong.
@joehoeller if the default code I sent you works in your environment, then use that as a starting point for your own development efforts. You simply mimic the coco data format with your own data. All of the info, including step by step directions and code to reproduce are in the custom training example in the wiki. https://github.com/ultralytics/yolov3/wiki
It does not for the last time. How many times do I have to tell you. Scroll up and read the label error. Because that’s what I get after I performed the command line cmd’s as given per your instructions.
Actually don’t bother because I’m already hooking up analytics and metrics to my own solution I’ve built in Torch w Tensorboard.
I got the same error,
Traceback (most recent call last): File "train.py", line 444, in <module> train() # train normally File "train.py", line 111, in train chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()} File "train.py", line 111, in <dictcomp> chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()} KeyError: 'module_list.85.Conv2d.weight'
I have tried the suggested steps, but nothing worked out. https://github.com/ultralytics/yolov3/issues/650#issuecomment-557939734
so sad! the same error:
File "train.py", line 444, in
@Samjith888 @inspire-lts @joehoeller see https://github.com/ultralytics/yolov3/issues/657
This error is caused by a user supplying incompatible --weights
and --cfg
arguments. To solve this you must specify no weights (i.e. random initialization of the model) using --weights ''
and any --cfg
, or use a --cfg
that is compatible with your --weights
. If none are specified, the defaults are --weights ultralytics49.pt
and --cfg cfg/yolov3-spp.cfg
.
Examples of compatible combinations are:
python3 train.py --weights yolov3.pt --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3.weights --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3-spp.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights ultralytics49.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights '' --cfg cfg/*.cfg # any cfg will work here
ultralytics49.pt
is currently the highest performing YOLOv3 model (trained from scratch using this repo) available at the default img-size
of 416 (see https://github.com/ultralytics/yolov3/issues/310), which is the reason it is used as the default backbone.
So for the last time, what does this mean and how do I fix it:
assert nf > 0, 'No labels found. Recommend correcting image and label paths.'
A lot of this could be resolved if there was better docs and tutorials, with some minor improvements in the code. \How can we work together to make this happen?
On Mon, Nov 25, 2019 at 3:00 PM Glenn Jocher notifications@github.com wrote:
@Samjith888 https://github.com/Samjith888 @inspire-lts https://github.com/inspire-lts @joehoeller https://github.com/joehoeller see #657 https://github.com/ultralytics/yolov3/issues/657
This error is caused by a user supplying incompatible --weights and --cfg arguments. To solve this you must specify no weights (i.e. random initialization of the model) using --weights '' and any --cfg, or use a --cfg that is compatible with your --weights. If none are specified, the defaults are --weights ultralytics49.pt and --cfg cfg/yolov3-spp.cfg.
Examples of compatible combinations are:
python3 train.py --weights yolov3.pt --cfg cfg/yolov3.cfg python3 train.py --weights yolov3.weights --cfg cfg/yolov3.cfg python3 train.py --weights yolov3-spp.pt --cfg cfg/yolov3-spp.cfg python3 train.py --weights ultralytics49.pt --cfg cfg/yolov3-spp.cfg python3 train.py --weights '' --cfg cfg/*.cfg # any cfg will work here
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHC7LZKFTZGM75L2MJTQVQ4ILA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDY7NY#issuecomment-558337975, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHAQBVMTBZ5JKGX6B63QVQ4ILANCNFSM4JQ3CBSA .
They were pretty clear in their meanings. But it took me a second try to get my labels working. I am going to write a Medium article on how to use the Ultralytics model to train better. And write a wiki page on distributive computing. It is a great model. With some amazing work. But I feel that if we all contribute it can be the top Yolov3 model.
For sure, I have some ideas and some more ppl in CV space willing to help as well.
On Wed, Nov 27, 2019 at 9:24 AM Francisco Reveriano < notifications@github.com> wrote:
They were pretty clear in their meanings. But it took me a second try to get my labels working. I am going to write a Medium article on how to use the Ultralytics model to train better. And write a wiki page on distributive computing. It is a great model. With some amazing work. But I feel that if we all contribute it can be the top Yolov3 model.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHACMNKX5BPH4UIZB4TQV2GJDA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFJ3BMY#issuecomment-559132851, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHAB2MB7ZPCTT336NXDQV2GJDANCNFSM4JQ3CBSA .
Send me a message we can collaborate in article. Or add me at Linked which is my profile.
Will do. Thanks!
I just updated the mAP section of the README with the latest results. We're making good progress on training. Earlier in the year we were behind darknet, now we are ahead in most metrics, using the same yolov3-spp.cfg architecture. The best results now are from ultralytics68.pt, which I should have up on the Google Drive folder soon.
https://github.com/ultralytics/yolov3#map
320mAP@0.5:0.95 | 416mAP@0.5:0.95 | 608mAP@0.5:0.95 | |
---|---|---|---|
darknet YOLOv3-tiny |
14.0 | 16.0 | 16.6 |
darknet YOLOv3 |
28.7 | 31.1 | 33.0 |
darknet YOLOv3-SPP |
30.5 | 33.9 | 37.0 |
ultralytics YOLOv3-SPP |
35.2 | 38.8 | 40.4 |
Yes a medium article and better docs would be great! I don't have much time unfortunately though, between running different trainings and developing/debugging.
How do we correct image/label paths? I have them but it is not clear as to where to set those up at.
AssertionError: No labels found. Recommend correcting image and label paths.
Send me a message we can collaborate in article. Or add me at Linked which is my profile.
Message is in your LinkedIn inbox. I built the automation tool, I now call "Dark Chocolate", it converts COCO annotations to Darknet annotation format.
@joehoeller coco.data points to the train.txt and test.txt list of images on lines 2 and 3.
These files have lists of image paths as they would be from the yolov3 directory:
If in doubt, you can run python3 train.py in debug mode, and put a breakpoint on this line to see what values the img_files are. If there are no images there, or if there are no labels in the corresponding labels folder (by replacing /images/
with /labels/
in the image paths) you will get this error message.
These are my own images and my own annotations (in Darknet format).
The images you show are just the paths to the images. Not the labels.
On Sat, Nov 30, 2019 at 9:38 PM Glenn Jocher notifications@github.com wrote:
@joehoeller https://github.com/joehoeller coco.data points to the train.txt and test.txt list of images on lines 2 and 3. [image: image] https://user-images.githubusercontent.com/26833433/69908986-7402de80-13a8-11ea-862c-c75765f5d790.png
These files have lists of image paths as they would be from the yolov3 directory: [image: image] https://user-images.githubusercontent.com/26833433/69908992-81b86400-13a8-11ea-8a8f-4b4f41476b89.png
If in doubt, you can run python3 train.py in debug mode, and put a breakpoint on this line to see what values the img_files are. If there are no images there, or if there are no labels in the corresponding labels folder (by replacing /images/ with /labels/ in the image paths) you will get this error message.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHEYQIWXZCWJMGMAKWTQWMWSHA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFQ2V5Q#issuecomment-560048886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHEUXYIS6J2LDUNWI6DQWMWSHANCNFSM4JQ3CBSA .
I will be uploading a reader for this if you have a custom Dataset. All I can say is that its better if you are using the full path of the images. So the computer knows where to grab the images/labels.
@joehoeller the same structure is used for custom data as for coco. The labels need to be in a separate folder next to the images folder. The labels folder needs to be found simply by replacing /images/
with /labels/
in the image folder path, like this custom "dataset1" (ds1). Each labelname is identical to each image name, except the extension for the labels is *.txt. This example trains on the first 8 images of the dataset, and tests on the last 2.
The paths all need to be relative to your yolov3 folder (or absolute paths, though these break easier if you send the code to a different environment).
Then run:
cd yolov3
python3 train.py --data ../data/ds1/out.data
BTW @FranciscoReveriano @joehoeller this is legacy structure from darknet, so the same exact data can also be used to train darknet.
This repo now outperforms darknet by a wide margin I believe, but nevertheless darknet has a strong following (i.e. pjreddier/darknet has 15k stars, alexeyab/darknet has 6k stars), so I'm not sure if we should keep following the darknet convention, or perhaps start from a clean-slate mentality about what would be easiest for the most people to train their own custom data with a minimum of hassle.
In principle this repo is here to create the most accurate, fastest object detector in the world. In practice though, people seem to care more about quick results and ease of use, and don't care as much about being the best or the fastest.
I think we need to continue to DarkNet. I guess people still follow it because it provides a nice benchmark with a lot of literature. Although I don't think Machine Learning or Object Detection should be 'people'-proof. At some point people should be expected to do the learning curve. Seems like alot of people just want quick fixes.
Although it might not be a bad idea to make a version of Facebook's Detectron 2 that could be sold. That would be the best way to start from a clean state in my opinion.
I agree — one my biggest rants is how 98% of Git repos in CV space are awful and basic. People need to learn the concepts as well as the math. (I took the Udacity CV course and recommend it because it dives deep in Torch and math). However for small one off’s things like Darknet are perfect.
On Mon, Dec 2, 2019 at 8:25 AM Francisco Reveriano notifications@github.com wrote:
I think we need to continue to DarkNet. I guess people still follow it because it provides a nice benchmark with a lot of literature. Although I don't think Machine Learning or Object Detection should be 'people'-proof. At some point people should be expected to do the learning curve. Seems like alot of people just want quick fixes.
Although it might not be a bad idea to make a version of Facebook's Detectron 2 that could be sold. That would be the best way to start from a clean state in my opinion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHHE6F5RASKDYTBR6STQWULHLA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFTU4HY#issuecomment-560418335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHEPBVVEBAEYDBW6RBDQWULHLANCNFSM4JQ3CBSA .
For me. The problem is when people ask you to interpret, figure out, or tell them to how to make their results much better. This is GitHub not ResearchGate. I was looking for a Udacity course to take this break. I might do that CV course. Most my experience is with Tensorflow and Keras. Trying to move to Torch like the rest of us.
COCO JSON to Darknet/YOLOv3 annotation conversion tool, see readme for instructions and how to validate: https://github.com/joehoeller/Dark-Chocolate
On Mon, Dec 2, 2019 at 9:23 AM Francisco Reveriano notifications@github.com wrote:
For me. The problem is when people ask you to interpret, figure out, or tell them to how to make their results much better. This is GitHub not ResearchGate. I was looking for a Udacity course to take this break. I might do that CV course. Most my experience is with Tensorflow and Keras. Trying to move to Torch like the rest of us.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/650?email_source=notifications&email_token=ABHVQHBV2CTIFCI3HJJRQJDQWUR7HA5CNFSM4JQ3CBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFT26FQ#issuecomment-560443158, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHFCLTLGEMWGFNU6TH3QWUR7HANCNFSM4JQ3CBSA .
@glenn-jocher you show paths for images but not labels - i have been doing all of this already, and just like others in this thread it continues to fail.
@joehoeller the label paths are inferred automatically by replacing /images/ with /labels/ in the image paths. You only need to specify image paths.
The labelfile definition happens here.
@Samjith888 @inspire-lts @joehoeller see #657
This error is caused by a user supplying incompatible
--weights
and--cfg
arguments. To solve this you must specify no weights (i.e. random initialization of the model) using--weights ''
and any--cfg
, or use a--cfg
that is compatible with your--weights
. If none are specified, the defaults are--weights ultralytics49.pt
and--cfg cfg/yolov3-spp.cfg
.Examples of compatible combinations are:
python3 train.py --weights yolov3.pt --cfg cfg/yolov3.cfg python3 train.py --weights yolov3.weights --cfg cfg/yolov3.cfg python3 train.py --weights yolov3-spp.pt --cfg cfg/yolov3-spp.cfg python3 train.py --weights ultralytics49.pt --cfg cfg/yolov3-spp.cfg python3 train.py --weights '' --cfg cfg/*.cfg # any cfg will work here
ultralytics49.pt
is currently the highest performing YOLOv3 model (trained from scratch using this repo) available at the defaultimg-size
of 416 (see #310), which is the reason it is used as the default backbone.
This tutorial, https://docs.ultralytics.com/yolov5/tutorials/train_custom_data , says:
I HAVE TRIED ALL OF THE SUGGESTIONS ABOVE AND STILL GET:
assert nf > 0, 'No labels found. Recommend correcting image and label paths.
This script will generate file paths to images:
import os
filee = open('FILE_NAME.txt','w')
given_dir = 'PATH_TO_CUSTOM_IMAGES'
[filee.write(os.path.join(given_dir,i)+'\n') for i in os.listdir(given_dir)]
I got it going, now have CUDA memory error, but that's a "me" problem. Not a "you" problem. I will write a very clear and concise tutorial for medium when I am done.
Yes, I think the default training settings should probably use a smaller batch size. The current settings should work fine for a 1080Ti or 2080Ti and up (11GB) cuda memory, but smaller graphics cards may run out.
The current default is --batch-size 32 --accumulate 2
to get to an effective 64 batch size. I think I should reduce this to --batch-size 16 --accumulate 4
to get the most number of people running smoothly without CUDA out of memory issues. The performance hit (from batch norming less images) is not very large.
Ok, this should do it: https://github.com/ultralytics/yolov3/commit/93a70d958a1138b082f7b5c29c550b7d383f56f3
If you git pull
you can get all the latest updates.
Hey I get a new error whan I run the train script: