Is it possible to retrain you model ?

sonia-auv-private commented 6 years ago

Hi,

I was wondering if there is any method that would let us retrain this model using Pascal voc a notion files and images ???

gustavz commented 6 years ago

yes ofcourse. Just use the skripts train.py and eval.py provided by Tensorflow's Object Detection API like you would with any other model. In stuff/ssd_mobilenet_checkpoints you find the same checkpoint files i used, but they are the original ones provided by Tensorflow.

gauthiermartin commented 6 years ago

Thank you

uzbhutta commented 6 years ago

Hi, Just to clarify, I must train my Tensorflow Object Detection API only on 600x600px or 300x300px images in order for it to work with config file, and then place my trained ckpt file under stuff/ssd_mobilenet_checkpoints and run your scripts as usual, is this correct?

Thanks so much.

gustavz commented 6 years ago

Hey @uzbhutta,

I suggest you should first take a closer look at tensorflows original object detection API. Try to understand how training and inferencing works, which scripts are usable. And after that you take a look at my code and what it does.

To give you a short overview: It does not matter what size your images have that you train on as if you train with tfs api they will always be resized to a fixed size which you set in the config file. And this size is normally 300x300 for SSD. But you can ofcourse train a network on 600x600 if you like. But then you won’t be able to use a pretrained model as starting point as the weights are bound to the input dimensions that you train on.

So while training you get several checkpoints in an interval that you also set in the config.

And finally when you want to use my api to do inference, then you need to export one of those checkpoint files to a frozen model in the pb format.

This frozen model can then be included in my api and Adressen correctly in my config.yml.

And another thing: make sure to use my checkpoint files as starting point as my speed hack, the split model + multithreading only works if your model has the exact same layer names as mine.

I hope i could clearify some things for you.

Cheers Gustav

David-Lee-1990 commented 6 years ago

@GustavZ where is your checkpoint file? I trained on my own labeled data with tensorflow's object detection api _using your config file located in models/ssd_mobilenet_v11coco/. After training, I replace the frozen graph in models/ssd_mobilenet_v11_coco/.

When do inferencing, there comes an error:

ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

I wonder _why my frozen graph has the Node 'Preprocessor/map/TensorArray2' but your frozen graph does not.

gustavz commented 6 years ago

@David-Lee-1990 Which version of the model_zoo did you take? (which date is added at the end?) As Tensorflow seems to have changed some layer names in the newer version than the one i used (2017_11_17).

My checkpoint file is inside the model dir of ssd_mobilenet: https://github.com/GustavZ/realtime_object_detection/tree/master/models/ssd_mobilenet_v11_coco

With this checkpoint it should work, at least it did for my retrainings.

I hope i could help you!

David-Lee-1990 commented 6 years ago

@GustavZ I retrained my data using the configue file and the model.ckpt files in your model dir of ssd_mobilenet. But after that, I still encounter the same problem ( Node 'Preprocessor/map/TensorArray_2'). I wonder whether this is caused by the version difference of tensorflow? my tensorflow version is 1.8.

gustavz commented 6 years ago

Yes pretty sure. There are so many changings during the version which lead to strange behavior and errors. I also keep switching versions all the time when I face errors.

Try tf 1.4 that’s where I started this project.

David-Lee-1990 commented 6 years ago

tf 1.4 is not available for training tensorflow's object detection api now for the 'AttributeError: module 'tensorflow.contrib.data' has no attribute 'parallel_interleave'.

I tried tf 1.5 to retrain the model, but the result graph still has the node 'Preprocessor/map/TensorArray_2'. This drives me crazy!

AnthonyLabaere commented 6 years ago

Hi @GustavZ,

First of all thanks for your work. It's really great.

However I have the same problem :/

Traceback (most recent call last):
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 489, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "...\realtime_object_detection-2.0\run_objectdetection.py", line 178, in <module>
    config.NUM_CLASSES,config.SPLIT_MODEL, config.SSD_SHAPE).prepare_od_model()
  File "...\realtime_object_detection-2.0\rod\model.py", line 157, in prepare_od_model
    self.load_frozenmodel()
  File "...\realtime_object_detection-2.0\rod\model.py", line 129, in load_frozenmodel
    tf.import_graph_def(remove, name='')
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 493, in import_graph_def
    raise ValueError(str(e))
ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

I have trained my model with tf 1.8, replaced your configuration and model by mines and tried a run. The same issue occurs when I try a run with your release 1.0.

For information :

I runned object_detection.py on your release 1.0 and run_objectdetection.py on 2.0 : it worked with your default configuration.
I runned object_detection_tutorial (from object_detection) with my model and it worked.

AnthonyLabaere commented 6 years ago

Ok my bad, i turned off SPLIT_MODEL and it works now.

gustavz commented 6 years ago

Dont use v2.0 Use master. I will update that next week

David-Lee-1990 commented 6 years ago

@AnthonyLabaere Hi, after turning on SPLIT_MODEL, your model works now? ValueError: Node 'Preprocessor/map/TensorArray_2' gone?

gustavz commented 6 years ago

Again: the split_model speed hack will ONLY work with ssd_mobilenet_v1 Models that are exported from the exact same checkpoint that I used and published in /models. Tensorflow and also the SSDMetaArch inside models/object_detection changes.

I have no insight on this as I am not working with ssd anymore. If you want to apply the speed hack to those models you need to investigate by your own. Sorry.

But if you find a solution you are very welcome to contribute / file a PR.

Gustav

David-Lee-1990 commented 6 years ago

@GustavZ ok, thanks!

AnthonyLabaere commented 6 years ago

@David-Lee-1990 I just succeeded to make it work on my computer (on Windows) and on my raspberry (with some updates) with my model. And yes the issue with 'Preprocessor/map/TensorArray_2' is gone because this part (with SPLIT_MODEL true) concerns the GPU.

@GustavZ If I find a "real" solution I would make a PR but for now I didn't find anything :/ Ok I will use master i nthe future.

David-Lee-1990 commented 6 years ago

@AnthonyLabaere is your model trained by tensorflow's object detection api? what do you mean by saying ''Preprocessor/map/TensorArray_2' is gone because this part concerns the GPU'?
I check the frozen graph generated by tensorflow, and find after the node 'TensorArray_2' , the graph directly goes to Batch-NMS nodes without feature extraction.

AnthonyLabaere commented 6 years ago

@David-Lee-1990 yes it is trained by tensorflow's object detection api. Well, concerning the 'Preprocessor/map/TensorArray_2', I spoke too fast. I don't know why the problem is gone sorry.

How do you see that ? With tensorboard ?

naisy commented 6 years ago

Hi,

Split model hack solution is only avaiable in ssd_mobilenet_v1 with 300x300. 'Preprocessor/map/TensorArray_2' that appears with 600x600 train image.

Set your ssd_mobilenet_v1_coco.config with 300x300 size.

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }

David-Lee-1990 commented 6 years ago

@naisy Hi, have you tried this 300300 config? In fact, my config is set with 300 300 all the time, but there is still the error.

naisy commented 6 years ago

Hi @David-Lee-1990,

I check config now. config in master branch was changed. Please use r1.5 branch for ssd_mobilenet_v1.

--- r1.5    2018-06-18 01:43:31.752331891 +0000
+++ master  2018-06-18 01:43:18.056376250 +0000
@@ -108,12 +108,10 @@
     loss {
       classification_loss {
         weighted_sigmoid {
-          anchorwise_output: true
         }
       }
       localization_loss {
         weighted_smooth_l1 {
-          anchorwise_output: true
         }
       }
       hard_example_miner {
@@ -193,5 +191,4 @@
   label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
   shuffle: false
   num_readers: 1
-  num_epochs: 1
 }

My own training is here: https://github.com/naisy/train_ssd_mobilenet

David-Lee-1990 commented 6 years ago

@naisy Thank you for your tips. Problem solved!

gustavz / realtime_object_detection

Is it possible to retrain you model ? #10