yuantailing / ctw-baseline

Baseline methods for [CTW dataset](https://ctwdataset.github.io/)
MIT License
329 stars 88 forks source link

Unable to run eval.py successfully #17

Closed w804479595 closed 6 years ago

w804479595 commented 6 years ago

Hi,I followed your steps in tutorial 3-detection ,but I got some error when I try to run eval.py. Here is the wrong log: mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...conv 5030 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x5030 30 detection mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 4 Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 4 8 Exception in thread Thread-2: Traceback (most recent call last): File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "/home/gxwang2/ctw/ctw-baseline/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 70, in eval_yolo assert 0 == p.returncode AssertionError

Exception in thread Thread-1: Traceback (most recent call last): File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "/home/gxwang2/ctw/ctw-baseline/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 70, in eval_yolo assert 0 == p.returncode AssertionError

I changed the num_thread and TEST_NUM_GPU but that doesn't work. Could you give me some help? Thank you

yuantailing commented 6 years ago

Test this command:

darknet/darknet detector valid products/chinese.0.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.0

and check whether return code is 0.

w804479595 commented 6 years ago

I try your command and return code is 0. Then I check the settings.py and found out the reason: 'DARKNET_RESULTS_OUT' has been changed into ‘products/results/chinese.txt’ I using 'DARKNET_RESULTS_OUT = chinese ' to run eval.py again and the return code is 0 now. Thanks a lot.

yuantailing commented 6 years ago

Note: 'settings.DARKNET_RESULTS_OUT' parameter is directly passed to 'outfile' in https://github.com/pjreddie/darknet/blob/f6d861736038da22c9eb0739dca84003c5a5e275/examples/detector.c#L409 Line 409, that's why we cannot write a file path. And so does 'settings.DARKNET_RESULTS_DIR', it is passed to 'prefix'.

peppapeppapeppapeppa commented 5 years ago

您好,我在python3 eval.py时候出现了问题,请问您知道怎么解决吗? `make -j16 make: Nothing to be done for 'all'. CUDA_VISIBLE_DEVICES=0 darknet/darknet detector valid products/chinese.1.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.1 CUDA_VISIBLE_DEVICES=0 darknet/darknet detector valid products/chinese.0.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.0 layer filters size input output layer filters size input output 0 0 conv 32 3 x 3 / 1 1216 x1216 x 3 -> 1216 x1216 x 32 1 max 2 x 2 / 2 1216 x1216 x 32 -> 608 x 608 x 32 2 conv 32 3 x 3 / 1 1216 x1216 x 3 -> 1216 x1216 x 32 1 max 2 x 2 / 2 1216 x1216 x 32 -> 608 x 608 x 32 2 conv 64 3 x 3 / 1 608 x 608 x 32 -> 608 x 608 x 64 3 max 2 x 2 / 2 608 x 608 x 64 -> 304 x 304 x 64 4 conv 64 3 x 3 / 1 608 x 608 x 32 -> 608 x 608 x 64 3 conv 128 3 x 3 / 1 304 x 304 x 64 -> 304 x 304 x 128 5 max 2 x 2 / 2 608 x 608 x 64 -> 304 x 304 x 64 4 conv 64 1 x 1 / 1 304 x 304 x 128 -> 304 x 304 x 64 6 conv 128 3 x 3 / 1 304 x 304 x 64 -> 304 x 304 x 128 5 conv 128 3 x 3 / 1 304 x 304 x 64 -> 304 x 304 x 128 7 conv 64 1 x 1 / 1 304 x 304 x 128 -> 304 x 304 x 64 6 max 2 x 2 / 2 304 x 304 x 128 -> 152 x 152 x 128 8 conv 128 3 x 3 / 1 304 x 304 x 64 -> 304 x 304 x 128 7 conv 256 3 x 3 / 1 152 x 152 x 128 -> 152 x 152 x 256 9 max 2 x 2 / 2 304 x 304 x 128 -> 152 x 152 x 128 8 conv 128 1 x 1 / 1 152 x 152 x 256 -> 152 x 152 x 128 10 conv 256 3 x 3 / 1 152 x 152 x 128 -> 152 x 152 x 256 9 conv 256 3 x 3 / 1 152 x 152 x 128 -> 152 x 152 x 256 11 max 2 x 2 / 2 152 x 152 x 256 -> 76 x 76 x 256 12 conv 128 1 x 1 / 1 152 x 152 x 256 -> 152 x 152 x 128 10 conv 256 3 x 3 / 1 152 x 152 x 128 -> 152 x 152 x 256 11 max 2 x 2 / 2 152 x 152 x 256 -> 76 x 76 x 256 12 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 13 conv 256 1 x 1 / 1 76 x 76 x 512 -> 76 x 76 x 256 14 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 13 conv 256 1 x 1 / 1 76 x 76 x 512 -> 76 x 76 x 256 14 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 15 conv 256 1 x 1 / 1 76 x 76 x 512 -> 76 x 76 x 256 16 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 15 conv 256 1 x 1 / 1 76 x 76 x 512 -> 76 x 76 x 256 16 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 17 max 2 x 2 / 2 76 x 76 x 512 -> 38 x 38 x 512 18 conv 512 3 x 3 / 1 76 x 76 x 256 -> 76 x 76 x 512 17 max 2 x 2 / 2 76 x 76 x 512 -> 38 x 38 x 512 18 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 19 conv 512 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x 512 20 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 19 conv 512 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x 512 20 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 21 conv 512 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x 512 22 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 21 conv 512 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x 512 22 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 23 conv 1024 3 x 3 / 1 38 x 38 x 512 -> 38 x 38 x1024 23 conv 1024 3 x 3 / 1 38 x 38 x1024 -> 38 x 38 x1024 24 conv 1024 3 x 3 / 1 38 x 38 x1024 -> 38 x 38 x1024 24 conv 1024 3 x 3 / 1 38 x 38 x1024 -> 38 x 38 x1024 25 route 16 26 reorg / 2 76 x 76 x 512 -> 38 x 38 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 38 x 38 x1024 -> 38 x 38 x1024 25 route 16 26 reorg / 2 76 x 76 x 512 -> 38 x 38 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 38 x 38 x3072 -> 38 x 38 x1024 29 conv 1024 3 x 3 / 1 38 x 38 x3072 -> 38 x 38 x1024 29 conv 5030 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x5030 30 detection mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...conv 5030 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x5030 30 detection mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Exception in thread Thread-2: Traceback (most recent call last): File "/home/smart/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/home/smart/anaconda3/lib/python3.7/threading.py", line 865, in run self._target(*self._args, *self._kwargs) File "/home/smart/zhuxuan/CTW/ctw-baseline-master/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 65, in eval_yolo assert 0 == p.returncode AssertionError

Exception in thread Thread-1: Traceback (most recent call last): File "/home/smart/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/home/smart/anaconda3/lib/python3.7/threading.py", line 865, in run self._target(*self._args, *self._kwargs) File "/home/smart/zhuxuan/CTW/ctw-baseline-master/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 65, in eval_yolo assert 0 == p.returncode AssertionError `

peppapeppapeppapeppa commented 5 years ago

Hi,I followed your steps in tutorial 3-detection ,but I got some error when I try to run eval.py. Here is the wrong log: mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...conv 5030 1 x 1 / 1 38 x 38 x1024 -> 38 x 38 x5030 30 detection mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 4 Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 4 8 Exception in thread Thread-2: Traceback (most recent call last): File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "/home/gxwang2/ctw/ctw-baseline/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 70, in eval_yolo assert 0 == p.returncode AssertionError

Exception in thread Thread-1: Traceback (most recent call last): File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/deeplearn/anaconda3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "/home/gxwang2/ctw/ctw-baseline/detection/pythonapi/common_tools.py", line 77, in parallel_work func(args_list[i], tid=tid) File "eval.py", line 70, in eval_yolo assert 0 == p.returncode AssertionError

I changed the num_thread and TEST_NUM_GPU but that doesn't work. Could you give me some help? Thank you

请问您这个问题解决了吗,是如何解决的呢?期待您的回答

peppapeppapeppapeppa commented 5 years ago

测试该命令时darknet/darknet detector valid products/chinese.0.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.0 显示mask_scale: Using default '1.000000' Loading weights from products/backup/yolo-chinese_final.weights...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 段错误 期待您的回答,谢谢

yuantailing commented 5 years ago

先看一下使用的 darknet 版本是不是我改过的版本: https://github.com/yuantailing/darknet/tree/801409731a46b072b8ee11f6f2acfa29cb16f165

这是 Git Submodule 引用的指定版本,并不是任何版本的 darknet 都行。

peppapeppapeppapeppa commented 5 years ago

先看一下使用的 darknet 版本是不是我改过的版本: https://github.com/yuantailing/darknet/tree/801409731a46b072b8ee11f6f2acfa29cb16f165

这是 Git Submodule 引用的指定版本,并不是任何版本的 darknet 都行。

您好 我检查一下确实是您的改过的版本,但是还是出现了该错误,不知道问题出在了哪里

peppapeppapeppapeppa commented 5 years ago

而且只是在测试时执行python3 eval.py出现的问题,训练的时候没有报错。期待您的回答。

yuantailing commented 5 years ago

命令行里单进程调用 darknet/darknet 也出问题的话,请确认空闲显存大于 4GB。既然训练时都没报错,应该是够的。我没遇到过这个问题,这得用gdb或加断点查程序崩在哪了。

yuantailing commented 5 years ago

运行 darknet/darknet detector valid products/chinese.0.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.0 前,

确认文件夹 products/results 存在

确认 products/test.0.txt 里引用的图片存在

确认以下文件存在,核对文件内容

products/chinese.0.data classes = 1001 eval = chinese names = products/chinese.names results = products/results valid = products/test.0.txt

products/chinese.names 0 1 2 ...

products/yolo-chinese-test.cfg [net]

batch=1 subdivisions=1

height=1216 width=1216 channels=3 ...

peppapeppapeppapeppa commented 5 years ago

运行 darknet/darknet detector valid products/chinese.0.data products/yolo-chinese-test.cfg products/backup/yolo-chinese_final.weights -out chinese.0 前,

确认文件夹 products/results 存在

确认 products/test.0.txt 里引用的图片存在

确认以下文件存在,核对文件内容

products/chinese.0.data classes = 1001 eval = chinese names = products/chinese.names results = products/results valid = products/test.0.txt

products/chinese.names 0 1 2 ...

products/yolo-chinese-test.cfg [net]

batch=1 subdivisions=1

height=1216 width=1216 channels=3 ...

感谢您的耐心解答。我按照您说的仔细核查了一遍,发现文件夹 products/test.0.txt中显示0字节,没有任何内容。其他的文件夹以及文件并没有什么问题。并且data/images/test文件夹中并没有生成测试集图片,之前的指令只生成了data/images/trainval文件夹中的训练集图片。不知是哪里出现了问题。

yuantailing commented 5 years ago

prepare_test_data.py 再运行一遍

peppapeppapeppapeppa commented 5 years ago

prepare_test_data.py 再运行一遍

运行 prepare_test_data.py之后没反应,我觉得问题出在cd ../prepare && python3 symlink_images.py这里,因为运行此命令之后只有文件夹 data/images/trainval中生成了图片,而文件夹data/images/test中并没有图片。我也没有更改过代码,试了很多次还是不行。