kinhong / OpenLabeler

OpenLabeler is an open source desktop application for annotating objects for AI appplications
Apache License 2.0
115 stars 11 forks source link

Creating Custom model using OpenLabeler Training Suport on Windows 10 #12

Closed alexperezfdz closed 2 years ago

alexperezfdz commented 4 years ago

We are using OpenLabeler Training Support to create a custom object detection based on a pretrained model.

OpenLabeler in Windows 10, Docker, and containers has been put in place following the instructions. The containers has been verified through the Docker Dashboard. We use a sample config file from ssd_mobilenet_v2 pretrained model.

Starting train, a message alert appear indicating "train has started": image

Nothing happen after close the alert and we have a log showing "unable to train":

2020-01-31 17:43:05 INFO com.easymobo.openlabeler.OpenLabeler: OpenLabeler 1.2.1 2020-01-31 17:43:08 INFO com.easymobo.openlabeler.tensorflow.ObjectDetector: TensorFlow: 1.15.0 2020-01-31 17:45:13 INFO com.easymobo.openlabeler.tensorflow.TFTrainer: Created C:\temp\zzzcoco\data\label_map.pbtxt 2020-01-31 17:45:13 INFO com.easymobo.openlabeler.tensorflow.TFTrainer: Creating training data in C:\temp\zzzcoco\data... 2020-01-31 17:45:13 INFO com.easymobo.openlabeler.tensorflow.TFRecordCreator: Created train/eval records in C:\temp\zzzcoco\data 2020-01-31 17:45:13 INFO com.easymobo.openlabeler.tensorflow.TFTrainer: Created training data in C:\temp\zzzcoco\data 2020-01-31 17:49:26 INFO com.easymobo.openlabeler.tensorflow.TFTrainer: Created C:\temp\zzzcoco\data\label_map.pbtxt 2020-01-31 17:49:26 INFO com.easymobo.openlabeler.tensorflow.TFTrainer: Created C:\temp\zzzcoco\models\ssd_mobilenet_v2\model.config 2020-01-31 17:49:27 SEVERE com.easymobo.openlabeler.tensorflow.TFTrainer: Unable to train javax.ws.rs.ProcessingException at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:287) at org.glassfish.jersey.client. ....

The Docker train preferences: image

Could you please indicate us what is wrong?

kinhong commented 4 years ago

@alexperezfdz - please paste the entire stack trace of the 2020-01-31 17:49:27 SEVERE log event, or upload the log file for me to look into it further.

alexperezfdz commented 4 years ago

log.zip

Hello @kinhong . Here enclosed entier application event log, the model.config created file, and some images about the model folders structure, openlabeler train preferences and docker dashboard showing the gpu container running.

thanks for your support, alex

kinhong commented 4 years ago

@alexperezfdz - it looks like docker-java (the library OpenLabeler uses to communicate with docker service) is not able to create Unix sockets on Windows. I think we can get around this:

  1. Upgrade to OpenLabeler v1.2.2 - to get around a Bind#parse bug in docker-java
  2. Open Docker > Settings > General and check "Expose daemon on tcp://localhost:2375 without TLS"

image

  1. Create a <home>\.docker-java.properties file. Add a DOCKER_HOST=tcp://localhost:2375 entry. See reference

  2. Restart Docker Desktop/service in Windows

  3. Launch OpenLabeler and start training

alexperezfdz commented 4 years ago

@kinhong our error persist after perform the modifications. Find enclosed the log: log0_custom_ssd.log However, we have some questions:

thanks in advance for your support.

alex

kinhong commented 4 years ago

@alexperezfdz You are getting the same error, that means, at least, the settings in .docker-java.properties have not been picked up by the docker-java library. This file should reside under your user-home directory (on Windows 10, it is usually c:\Users\<username>), the same place as the .openlabeler directory.

The DOCKER_HOST port should be 2375 (sorry for the typo earlier).

There is no need to run the container before Start Training. OpenLabeler will run the container using the docker-java library.

alexperezfdz commented 4 years ago

@kinhong works fine! thanks for your support and your excelent job!

Its possible to execute tensorboard to show current train evaluation?