Thank you for your work,can you give your pretrained model for training?I can not train and get an error.

cuiwenting87 commented 2 years ago

1650028549(1)

I want to train myself datasets,I did it with the readme,but it occured error!Could you help me solve it problem?

zhang-tao-whu commented 2 years ago

Hi, thanks for your interest. We did not save the optim information to model_coco.pth, because it would have made the file more larger. If you have replaced the storage address and other information of the coco dataset with your own dataset's, you can choose to train from scratch or load the trained weights from the coco dataset and fine-tune the model on your dataset.

Train the model from scratch. python train_net.py coco --bs ${batch_size}
Load the model_coco.pth and fine-tune on your dataset. python train_net.py coco --bs ${batch_size} --checkpoint ${path_to_checkpoint} --type finetune If the categories of your dataset is different from coco dataset, you should change line 40 of train_net.py from begin_epoch = load_network(network, model_dir=args.checkpoint) to begin_epoch = load_network(network, model_dir=args.checkpoint, strict=False). If you encounter any other problems in the future, please feel free to continue asking questions. @cuiwenting87

cuiwenting87 commented 2 years ago

Thank you for your answer,I've run the method you said before, and the result shows that the AP value is zero. After 300 iterations, it is still zero, or it will report an error after the fourth iteration, as shown below:

Could you help me solve this problem?

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月16日(星期六) 中午1:23 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

Hi, thanks for your interest. We did not save the optim information to model_coco.pth, because it would have made the file more larger. If you have replaced the storage address and other information of the coco dataset with your own dataset's, you can choose to train from scratch or load the trained weights from the coco dataset and fine-tune the model on your dataset.

Train the model from scratch. python train_net.py coco --bs ${batch_size}

Load the model_coco.pth and fine-tune on your dataset. python train_net.py coco --bs ${batch_size} --checkpoint ${path_to_checkpoint} --type finetune If the categories of your dataset is different from coco dataset, you should change line 40 of train_net.py from begin_epoch = load_network(network, model_dir=args.checkpoint) to begin_epoch = load_network(network, model_dir=args.checkpoint, strict=False). If you encounter any other problems in the future, please feel free to continue asking questions.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zhang-tao-whu commented 2 years ago

I don't know what caused this. Can you provide some details of the error? It would be nice to have some screenshots like the one above.

cuiwenting87 commented 2 years ago

Thank you for you answer,those are all what I done. I have replaced the storeage address and other information of the coco dataset with my datasets in info.py.To avoid error,I made the name is same as yours.

Firstly,I trained from scratch,but the AP is always 0.

Then I load the trained weights from the model_coco.pth what you provided,but it occoured error.

That's all what I have done.

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月16日(星期六) 下午3:01 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

I don't know what caused this. Can you provide some details of the error? It would be nice to have some screenshots like the one above.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

zhang-tao-whu commented 2 years ago

When you load the trained weights from the model_coco.pth, what's going wrong，can you give some error reporting information (If --type finetune is added then it shouldn't report an error) ? And can you provide me with some training logs to help me analyze the AP is 0 when training from scratch.

cuiwenting87 commented 2 years ago

Those are the error when I load the trained weights from the model_coco.pth.

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月16日(星期六) 下午3:27 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

When you load the trained weights from the model_coco.pth, what's going wrong，can you give some error reporting information (If --type finetune is added then it shouldn't report an error) ? And can you provide me with some training logs to help me analyze the AP is 0 when training from scratch.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

zhang-tao-whu commented 2 years ago

I can't see anything below “Those are the error when I load the trained weights from the model_coco.pth.”. @cuiwenting87

zhang-tao-whu commented 2 years ago

You can visit https://github.com/zhang-tao-whu/e2ec/issues/3 and upload some screenshots.

cuiwenting87 commented 2 years ago

Sorry, the upload may not have been successful just now

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月16日(星期六) 下午3:56 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

I can't see anything below “Those are the error when I load the trained weights from the model_coco.pth.”. @cuiwenting87

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

cuiwenting87 commented 2 years ago

When I didn't load the trained weight from the model_coco.pth,it would remind me as follow: $G1Q6{)P1R6KHE}77_Q4IGA7$

zhang-tao-whu commented 2 years ago

The ct_loss is not working properly, but the other losses look fine. E2EC's ct_loss is calculated in exactly the same way as centernet. I have some suggestions for you to try：

Increase batch size, preferably over 8.
Check your config file. For COCO dataset, images are resized as (512, 512) for training and with original size for testing. If the size of your images differs significantly from (512, 512), please change the ${scale} and ${input_h, input_w}. Also make sure that your dataset categories are consistent with the ${ct_hm}.
Don't join dml at the start, please keep ${start_epoch} at 10.
Continue to iterate to see if ct_loss can be reduced to below 10.

cuiwenting87 commented 2 years ago

Thank you for your suggestions.I tryed as those,after 300 iterations，the ct_loss reduced to 50~60,but it is not stable.And the AP is still 0.

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月17日(星期天) 下午3:14 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

The ct_loss is not working properly, but the other losses look fine. E2EC's ct_loss is calculated in exactly the same way as centernet. I have some suggestions for you to try：

Increase batch size, preferably over 8.

Check your config file. For COCO dataset, images are resized as 512512 for training and with original size for testing. If the size of your images differs significantly from 512512, please change the ${scale} and ${input_h, input_w}. Also make sure that your dataset categories are consistent with the ${ct_hm}.

Don't join dml at the start, please keep ${start_epoch} at 10.

Continue to iterate to see if ct_loss can be reduced to below 10.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zhang-tao-whu commented 2 years ago

@cuiwenting87 Hi, I might know what went wrong. You may have used matrix coordinates instead of image coordinates when creating your dataset. The x of a matrix coordinate is y in image coordinates. You can use pycocotools.showAnns() to check the annotations. It would also be helpful if you could show me an image of your dataset and the corresponding annotations to help me analyse the problem better.

cuiwenting87 commented 2 years ago

Sorry,the picture I trained is too large,I can't upload.Could you provide me other ways,and I send it in other way？I trained the pictures by converting the JSON file marked by labelme into coco dataset format

------------------ 原始邮件 ------------------ 发件人: "zhang-tao-whu/e2ec" @.>; 发送时间: 2022年4月19日(星期二) 上午10:29 @.>; @.**@.>; 主题: Re: [zhang-tao-whu/e2ec] Thank you for your work,can you give your pretrained model for training?I can not train and get an error. (Issue #3)

@cuiwenting87 Hi, I might know what went wrong. You may have used matrix coordinates instead of image coordinates when creating your dataset. The x of a matrix coordinate is y in image coordinates. You can use pycocotools.showAnns() to check the annotations. It would also be helpful if you could show me an image of your dataset and the corresponding annotations to help me analyse the problem better.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

zhang-tao-whu commented 2 years ago

My email is zhang_tao@whu.edu.cn. You can upload the data to BaiDuYun and send me the link. Or you can send your contact details to my email and transfer the data online.

zhang-tao-whu commented 2 years ago

Incorrect training resolution and test resolution settings caused this problem, which has now been fixed.

SEUZTh commented 2 years ago

Hello, I meet the same problem. AP is always zero when I train it on my dataset.

python train_net.py coco_finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth

This is my config/_finetune.py:

from .base import commen, data, model, train, test
import numpy as np

data.mean = np.array([0.44726229, 0.43802511, 0.27905645],
                    dtype=np.float32).reshape(1, 1, 3)
std = np.array([0.22784984, 0.21254292, 0.16168552],
                   dtype=np.float32).reshape(1, 1, 3)

scale = np.array([640, 480])
input_w, input_h = (640, 480)

model.heads['ct_hm'] = 1

train.optimizer = {'name': 'sgd', 'lr': 1e-4, 'weight_decay': 1e-4,
                   'milestones': [150, ], 'gamma': 0.1}
train.batch_size = 12
train.epoch = 160
train.dataset = 'coco_train'

test.dataset = 'coco_val'

class config(object):
    commen = commen
    data = data
    model = model
    train = train
    test = test

zhang-tao-whu commented 2 years ago

Perhaps you should use the following command:

python train_net.py _finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth

SEUZTh commented 2 years ago

Perhaps you should use the following command:
python train_net.py _finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth
Sorry, it was my negligence. My config file's name is coco_finetune.py indeed.

Is it possible that the AP is 0 due to too few epoch？

zhang-tao-whu commented 2 years ago

No, there must be a mistake somewhere. The reason for the AP being 0 is that the detection branch does not detect any instances. I don't know what caused the above phenomenon. If you can, it would be good to give me some information about your dataset, such as image resolution, etc., and it would be good if you could show a demo.

SEUZTh commented 2 years ago

No, there must be a mistake somewhere. The reason for the AP being 0 is that the detection branch does not detect any instances. I don't know what caused the above phenomenon. If you can, it would be good to give me some information about your dataset, such as image resolution, etc., and it would be good if you could show a demo.

Thanks for your reply. There are two image resolutions(640x480 and 1280x960) in my dataset.

There are many instances in one image. The number ranges from ten to hundreds.

{'115.jpg': 62, '142.jpg': 14, '154.jpg': 54, '103.jpg': 100, '178.jpg': 16, '206.jpg': 14, '98.jpg': 138, '77.jpg': 179, '139.jpg': 17, '61.jpg': 246, '181.jpg': 18, '36.jpg': 159, '119.jpg': 52, '41.jpg': 122, '230.jpg': 13, '16.jpg': 60, '57.jpg': 157, '226.jpg': 3, '174.jpg': 8, '94.jpg': 95, '123.jpg': 71, '82.jpg': 89, '6.jpg': 129, '135.jpg': 34, '162.jpg': 44, '163.jpg': 17, '7.jpg': 230, '83.jpg': 83, '95.jpg': 78, '122.jpg': 76, '175.jpg': 13, '56.jpg': 143, '159.jpg': 62, '17.jpg': 149, '231.jpg': 19, '40.jpg': 167, '180.jpg': 16, '37.jpg': 148, '138.jpg': 53, '211.jpg': 13, '207.jpg': 14, '196.jpg': 10, '179.jpg': 5, '21.jpg': 228, '102.jpg': 103, '155.jpg': 58, '143.jpg': 27, '114.jpg': 79, '47.jpg': 175, '10.jpg': 171, '109.jpg': 79, '51.jpg': 163, '220.jpg': 40, '172.jpg': 13, '125.jpg': 57, '0.jpg': 138, '133.jpg': 52, '164.jpg': 9, '113.jpg': 64, '152.jpg': 46, '105.jpg': 99, '191.jpg': 9, '26.jpg': 392, '200.jpg': 15, '129.jpg': 48, '71.jpg': 185, '216.jpg': 16, '88.jpg': 75, '67.jpg': 153, '168.jpg': 11, '30.jpg': 170, '169.jpg': 27, '31.jpg': 154, '89.jpg': 102, '66.jpg': 94, '70.jpg': 184, '27.jpg': 263, '104.jpg': 39, '153.jpg': 89, '145.jpg': 21, '112.jpg': 49, '1.jpg': 404, '85.jpg': 102, '93.jpg': 66, '124.jpg': 49, '173.jpg': 24, '108.jpg': 130, '50.jpg': 298, '149.jpg': 19, '11.jpg': 109, '46.jpg': 193, '166.jpg': 6, '189.jpg': 23, '218.jpg': 20, '69.jpg': 164, '131.jpg': 40, '2.jpg': 135, '127.jpg': 28, '28.jpg': 219, '170.jpg': 12, '222.jpg': 11, '53.jpg': 130, '45.jpg': 154, '32.jpg': 218, '185.jpg': 11, '65.jpg': 132, '214.jpg': 14, '73.jpg': 227, '202.jpg': 24, '24.jpg': 189, '150.jpg': 74, '146.jpg': 18, '49.jpg': 326, '111.jpg': 106, '48.jpg': 171, '110.jpg': 56, '147.jpg': 24, '151.jpg': 66, '106.jpg': 72, '25.jpg': 45, '192.jpg': 10, '72.jpg': 230, '64.jpg': 144, '33.jpg': 267, '184.jpg': 8, '44.jpg': 204, '13.jpg': 91, '52.jpg': 184, '223.jpg': 8, '29.jpg': 186, '171.jpg': 16, '126.jpg': 55, '91.jpg': 93, '68.jpg': 163, '130.jpg': 32, '87.jpg': 76, '3.jpg': 323, '219.jpg': 4, '167.jpg': 13, '34.jpg': 96, '183.jpg': 7, '63.jpg': 214, '8.jpg': 303, '212.jpg': 16, '75.jpg': 89, '204.jpg': 27, '59.jpg': 107, '101.jpg': 95, '228.jpg': 16, '156.jpg': 61, '117.jpg': 71, '38.jpg': 263, '160.jpg': 54, '137.jpg': 77, '4.jpg': 238, '208.jpg': 13, '79.jpg': 114, '121.jpg': 77, '96.jpg': 116, '176.jpg': 10, '199.jpg': 30, '55.jpg': 136, '232.jpg': 27, '43.jpg': 121, '15.jpg': 63, '225.jpg': 20, '177.jpg': 10, '198.jpg': 8, '120.jpg': 35, '136.jpg': 47, '5.jpg': 162, '39.jpg': 198, '161.jpg': 42, '19.jpg': 179, '141.jpg': 15, '229.jpg': 21, '58.jpg': 97, '100.jpg': 59, '23.jpg': 243, '194.jpg': 7, '205.jpg': 20, '213.jpg': 17, '62.jpg': 251, '9.jpg': 183, '182.jpg': 15}

zhang-tao-whu / e2ec

Thank you for your work,can you give your pretrained model for training?I can not train and get an error. #3