megvii-model / YOLOF

MIT License
808 stars 115 forks source link

'no mudule named ''config' #15

Closed qijindao closed 3 years ago

qijindao commented 3 years ago

Hi! I have questions to disturb you. When trying to run train_net.py, I have no way to solve 'from config import config'.when the error exists'no mudule named ''config',I try to 'pip install config'.But there are still errors.I have searched for some way,but no way works.Can you help me ?

chensnathan commented 3 years ago

Hi, You can use pods_train --num-gpus 8 instead of directly running with train_net.py.

BTW, could you provide more details about how you install YOLOF and how you train with YOLOF?

qijindao commented 3 years ago

Thank you for your reply! I appreciate it.I find the pods_train,but it is not .py file,so i don't know how to use it. My environment is torch1.6 python3.8.When I try to run with train_net.py,I consistently install many modules according to error prompt.I also met the problem about cvpods,I just used 'python setup.py develop' according to the instruction.

qijindao commented 3 years ago

sorry,I haven't expressed my meaning clearly.I want to say' you means i needn't care about train_net.py although errors exists. What i need to do is use the instuction 'pods_train -- num-gpus 1''

chensnathan commented 3 years ago

pods_train is a shell script, you can use it directly with pods_train --num-gpus 8 in the directory (e.g., YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x).

BTW, you can find the pods_train file in YOLOF/tools/.

tangjiuqi097 commented 3 years ago

@qijindao Hi, you can try:

cd YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x
python YOLOF/tools/train_net.py  -- num-gpus 8

or

cd YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x
pods_train  -- num-gpus 8
qijindao commented 3 years ago

pods_train is a shell script, you can use it directly with pods_train --num-gpus 8 in the directory (e.g., YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x).

BTW, you can find the pods_train file in YOLOF/tools/.

Thank you for your reply.

qijindao commented 3 years ago

@qijindao Hi, you can try:

cd YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x
python YOLOF/tools/train_net.py  -- num-gpus 8

or

cd YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x
pods_train  -- num-gpus 8

Thank you for your reply.Have you trained the code successfully?I may have some questions

qijindao commented 3 years ago

我爆显存了,根据以往经验都是更改batchsize的大小,但是在这个文件夹里一直没有找到有关batchsize的代码,不知道是不是我漏读了

chensnathan commented 3 years ago

Could you provide more details about how you train with YOLOF?

qijindao commented 3 years ago

根据目录,我将coco2017的数据集放在datasets文件夹里。根据 cd YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x python YOLOF/tools/train_net.py -- num-gpus 1 指令运行来执行训练

chensnathan commented 3 years ago

YOLOF_res50_C5 needs 5.2~5.3G to train. If your GPU's memory is less than that, you should reduce the IMS_PER_DIVECE in the config.py file.

qijindao commented 3 years ago

好的,非常感谢你。因为我的电脑只有一个gpu。当我把config里面的devices改为1的时候,程序可以跑了。但是跑了一会时间,就出现了新的错误AssertionError: Box regression deltas become infinite or NaN!

tangjiuqi097 commented 3 years ago

@qijindao Can you provide you log file? It is at YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x/log/log.txt. BTW, I think it is because you modify the batch size but dose not modify the learning rate or the warmup iterations.

qijindao commented 3 years ago

@qijindao Can you provide you log file? It is at YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x/log/log.txt. BTW, I think it is because you modify the batch size but dose not modify the learning rate or the warmup iterations.

log.txt

tangjiuqi097 commented 3 years ago

@qijindao Can you provide you log file? It is at YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x/log/log.txt. BTW, I think it is because you modify the batch size but dose not modify the learning rate or the warmup iterations.

log.txt

Hi, the cvpods can automatically adjust the learning rate and iterations if you use a different number of gpus. However, the default setting is 8 images per GPU, if you use 1 image per GPU, you need to decrease the base learning rate by a factor of 8 and increase the iteration (as well as the warmup iteration) by a factor of 8. And you should also replace the Batchnorm with Groupnrom.

qijindao commented 3 years ago

Ok,thank you for your detailed reply.I can roughly understand your instruction.I am still uncertain of some code.First,in the runnning instruction'pods_train -- num-gpus 8' ,is '8' of 'gpus 8' the id of gpu in a computer? Or, is '8' of 'gpus 8' the quantity of gpu in a computer.Second, IMS_PER_DIVECE=8 means 8 images per GPU? Three,Do the values of IMS_PER_BATCH and IMS_PER_DIVECE have to be proportional? After many experiments of mine, I feel as if the ratio is equal to 8 to get through.Idon't know why.

tangjiuqi097 commented 3 years ago

@qijindao

  1. In 'pods_train -- num-gpus 8', "8" means that it uses a total of 8 GPUs.
  2. Yes, IMS_PER_DIVECE=8 means 8 images per GPU
  3. IMS_PER_BATCH = IMS_PER_DIVECE * num-gpus
qijindao commented 3 years ago

@qijindao

  1. In 'pods_train -- num-gpus 8', "8" means that it uses a total of 8 GPUs.
  2. Yes, IMS_PER_DIVECE=8 means 8 images per GPU
  3. IMS_PER_BATCH = IMS_PER_DIVECE * num-gpus

Thank you very much! I get it!