swook / GazeML

Gaze Estimation using Deep Learning, a Tensorflow-based framework.
MIT License
512 stars 141 forks source link

Training problems #18

Closed MinjingLin closed 5 years ago

MinjingLin commented 5 years ago

Hi, I use UnityEyes to generate images. Then I run the egg_train.py and got this problem:

10/04 06:34 INFO 0079261> heatmaps_mse = 0.00100194, radius_mse = 1.17517e-07 10/04 06:34 INFO 0079270> heatmaps_mse = 0.00119301, radius_mse = 8.82096e-08 10/04 06:34 INFO 0079280> heatmaps_mse = 0.00114937, radius_mse = 1.55061e-07 10/04 06:34 INFO 0079289> heatmaps_mse = 0.00109943, radius_mse = 1.84821e-07 Exception in thread preprocess_UnityEyes_27: Traceback (most recent call last): File "/home/wang/anaconda3/envs/tensorflow-gpu/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/wang/anaconda3/envs/tensorflow-gpu/lib/python3.5/threading.py", line 862, in run self._target(self._args, **self._kwargs) File "/media/wang/Toshiba/lmj/2019term/papers/GazeML/GazeML-win/src/core/data_source.py", line 245, in preprocess_job preprocessed_entry_dict = self.preprocess_entry(raw_entry) File "/media/wang/Toshiba/lmj/2019term/papers/GazeML/GazeML-win/src/datasources/unityeyes.py", line 237, in preprocess_entry thickness=int(6line_rand_nums[j + 4]), lineType=cv.LINE_AA) cv2.error: OpenCV(3.4.3) /io/opencv/modules/imgproc/src/drawing.cpp:1811: error: (-215:Assertion failed) 0 < thickness && thickness <= MAX_THICKNESS in function 'line'

MinjingLin commented 5 years ago

when I debug these code, I find that np.random.rand() may generate 0 for line_rand_nums and then thickness will be 0 for line. So I rewrite thickness=math.ceil(6*line_rand_nums[j + 4]),hope it will be helpful

swook commented 5 years ago

I think this should resolve the issue. Please do follow-up in this thread on whether it helped - or even better - submit a pull request. Otherwise, I will make a patch.

Thanks for finding this issue! It must be a relatively new check in the OpenCV library.

MinjingLin commented 5 years ago

I think this should resolve the issue. Please do follow-up in this thread on whether it helped - or even better - submit a pull request. Otherwise, I will make a patch.

Thanks for finding this issue! It must be a relatively new check in the OpenCV library.

about the training time: How long will it take for training 1 million imags?

swook commented 5 years ago

I remember training our model for the ETRA paper for 3-4 days.

MinjingLin commented 5 years ago

I remember training our model for the ETRA paper for 3-4 days.

Do you use multi-gpus to train the model? I spend about 14 days to train the ELG model that feeded 1 million UnityEye images in single TitanX. And I read though the training code carefully, it seems that you only use single GPU for training.

Eichizen17 commented 4 years ago

Do you use the same network to train your model?