open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

When I want to train a fcenet, I met a problem #271

Closed ShiMinghao0208 closed 3 years ago

ShiMinghao0208 commented 3 years ago

QQ图片20210609200559 It did run, but several epoch,there note that data not on the gpu. how to solve it?

cuhk-hbsun commented 3 years ago

@ShiMinghao0208 Can you try again and see if the same problem arises?

ShiMinghao0208 commented 3 years ago

@ShiMinghao0208 Can you try again and see if the same problem arises?

Yes, I ran the code several times and this problem came up every time

Zyq-scut commented 3 years ago

@ShiMinghao0208 I create an environment following the guidance and train FCENet. It work well and doesn't raise the error you mentioned. Can you check if the same problem arises when train other models? If so, i think it is caused by environment.

innerlee commented 3 years ago

Maybe it is caused by some bad data? Try to use binary search to locate

whynot08 commented 3 years ago

@ShiMinghao0208 Have you solved the problem yet? I met the same situation

ShiMinghao0208 commented 3 years ago

@ShiMinghao0208 Have you solved the problem yet? I met the same situation

The guidance have an error, the mmcv is cpu version.When I install the gpu-mmcv,the error disappeared.

whynot08 commented 3 years ago

@ShiMinghao0208 Have you solved the problem yet? I met the same situation

The guidance have an error, the mmcv is cpu version.When I install the gpu-mmcv,the error disappeared.

I will have a try, thank you very much.

innerlee commented 3 years ago

@ShiMinghao0208 could you please share the location of improper guidance? We will fix it asap

ShiMinghao0208 commented 3 years ago

@ShiMinghao0208 could you please share the location of improper guidance? We will fix it asap

I dont know the location.I asked others for help, and he said that was the reason

innerlee commented 3 years ago

Thanks for the feedback!

deckardcain1 commented 3 years ago

I am having this exact same problem. I installed the MMCV that matches my Torch/CUDA. What's even more peculiar is that I only get this error when training on certain datasets. For others, it appears to begin training properly.

innerlee commented 3 years ago

@mattlee2 It would be great if the error could be reproduced.

it might be due to some imgs in the dataset triggered some corner cases in the code. so

Try to use binary search to locate the bad data

FFoCC commented 3 years ago

I had the same problem. When I use fcenet + toy_dataset(or icdar2015_dataset), I have this problem. When I use other networks like drrg + toy_dataset, I don't have this problem.

iywo commented 3 years ago

Exactly the same problem. I have tried training fcenet+icdar2015 and fcenet+ctw1500, raising the same error. Other networks work fine for me.

innerlee commented 3 years ago

@Zyq-scut Would you like to take a look?

Zyq-scut commented 3 years ago

@Zyq-scut Would you like to take a look?

OK

Zyq-scut commented 3 years ago

@iywo Can you please provide your environment info? I can't reproduce this error in my environment.

iywo commented 3 years ago

I just followed the intallation guide https://mmocr.readthedocs.io/en/latest/install.html 00 01 02

iywo commented 3 years ago

GPU: NVIDIA-SMI 418.113 Driver Version: 418.113 CUDA Version: 10.1

iywo commented 3 years ago

The first run of fcenet+icdar2015

t1 t2 t3 t4 t5 t6

TongkunGuan commented 3 years ago

I also encountered the same error.

gaotongxiao commented 3 years ago

I tested it with pytorch==1.5.0 and got the same error. However, pytorch >= 1.6.0 works. Will update the documentation soon.

TongkunGuan commented 3 years ago

Thanks!

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmocr" @.>; 发送时间: 2021年6月29日(星期二) 下午2:22 @.>; @.**@.>; 主题: Re: [open-mmlab/mmocr] When I want to train a fcenet, I met a problem (#271)

I tested it with pytorch==1.5.0 and got the same error. However, pytorch >= 1.6.0 works. Will update the documentation soon.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.