Closed timisplump closed 7 years ago
Sorry for asking but hot did you produce your files to change for the "INSERT_PATH_HERE". I mean how did you produce the train.record and eval.record files needed to add in the above paths?
@EmmanouelP I have a custom labeled dataset that was not in TFRecord form. So, I wrote a script to collect the labels from my dataset and output them as a TFRecord, which is essentially a file with a list of TFExample's.
If you go here, you can see Tensorflow's sample script that does the same thing with another dataset that was downloaded online. https://github.com/tensorflow/models/blob/master/object_detection/create_pascal_tf_record.py Line 122 is the most "important" part, as you should be specifying the TFExample attributes there.
@timisplump So basically you had your own raw dataset (with images and your annotation files) and you used one of the provided scripts (modified in some way) in order to produce the TFRecord form files? Thanks in advance for all the help.Just trying to make same custom training and compare/share results and maybe even solve your problem too :).
Hi @timisplump - can you provide your labelmap too please? Sometimes that is at fault.
@jch1 At that time, my label map was as follows:
id: 0
name: 'car'
}
I'm still curious why that didn't work.
Strangely enough, I reverted a few of the specs that I changed from the PETs file (learning rate, l2_regularizer weights) back to what they were, trained on a dataset of size 5000 (somewhat close to the pets dataset), and the training seemed to work correctly. Additionally, I changed my label map to the following (while also changing the labels in my dataset, of course):
id: 0
name: 'none_of_the_above'
}
item {
id: 1
name: 'car'
}
After the above changes, the specs were the same as the PET example except for batch_size (12, memory issues), num_classes (1 in my case), and image_resizer (height=504, width=960 b/c that's the size of my images). This allowed training to work for some reason.
I doubt the none_of_the_above
class is what caused the problems, but if it isn't that, do you think it's the size of the dataset that caused the issue? The reason I haven't closed the issue is because I'm hoping to train on a dataset of size ~100k but I'm afraid that may not work (will report as soon as I try It).
Do you have any insight on what caused the original problem?
@EmmanouelP Yeah, that's exactly what I did. I wrote my own script to retrieve them and then I stored them in the tfrecord file the same way that other script did (create_pet_tf_record.py
I believe it's called). If you look at my above comment, I've discovered that that wasn't the problem, so you can do that as well. Best of luck!
@timisplump We currently ignore any class that has label index 0 (this is not very well documented, and we are in the process of adding better documentation). In your original label map, this would have caused your model to throw out all cars.
@jch1 thanks a bunch for the reply. I bet that's the problem.
Please document that soon so that others don't have to suffer through the pain I did! :)
Yup, this is already in the works and my apologies that you had to go through this. Thanks for sticking it out! I'm closing this issue, but feel free to re-open if you have more to discuss.
Hi Folks,
I am facing issue while trying to run the train.py in Windows 10 system. Below is the error message what I am getting.
from ._conv import register_converters as _register_converters Traceback (most recent call last): File "train.py", line 49, in <module> from object_detection import trainer File "D:\New\PythonCode\models-master\models-master\research\object_detection\trainer.py", line 27, in <module> from object_detection.builders import preprocessor_builder File "D:\New\PythonCode\models-master\models-master\research\object_detection\builders\preprocessor_builder.py", line 21, in <module> from object_detection.protos import preprocessor_pb2 ImportError: cannot import name 'preprocessor_pb2'
PYTHONPATH set as D:\New\PythonCode\models-master\models-master\research; D:\New\PythonCode\models-master\models-master\research\slim
Command I am usinng to train my model is python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
I am struggling with this issue from last couple of days, any help/guidance to resolve this will be highly appreciable.
Thanks, Rana
System information
Describe the problem
I am unable to train any of the pre-trained models on my own dataset. For testing purposes, i constructed a training dataset with only 1 image, so the model should simply learn to memorize that image's objects. This image is also used for the "test" set. Also, to make things simpler, I'm using only one class (cars) for detection.
I trained on this image with the SSD mobilenet and inception networks (and then tried again with Faster R-CNN, to the same results). Each model converged, or at least the loss went to 0. See below for training logs. However, when I ran
eval.py
on the latest saved model checkpoint, every single time it returns amAP
of 0.0. I froze the models using theexport_inference_graph.py
script, and output their detections using the iPython notebook and there are 25+ boxes, none of which are near any of the 9 cars in the image.I modified
trainer.py
so that it saves my model's checkpoint every minute of training, this way I don't have to wait until the saver decides to save the checkpoint. This was the only modification I made totrainer
or any of the training scripts.To construct my dataset, I used a custom script that took our annotations/labels and output them into TFRecords, the same way the examples did it. In my script, to be sure nothing weird was going on, I printed out the TFExample I wrote to file right before writing it. Below is the TFExample with the bytes_list omitted due to its size.
I've been debugging this issue for days.. Strangely, I am able to successfully train on the PETS dataset and the model appears to learn something when training on it. I'm really confused what I did wrong and what is making the model's loss go to 0 when it clearly isn't learning anything. Thanks for any help!
Source code / logs
Train logs
SSD_mobilenet config: