hitachi-rd-cv / qpic

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"
Apache License 2.0
131 stars 33 forks source link

how many hours does the qpic model need to be trained on HICO-DET and V-COCO respectively? #20

Open truetone2022 opened 2 years ago

tamtamz commented 2 years ago

With the batch size of 2 and 8 V100 GPUs, it approximately takes about 38 hours for HICO-DET and 8 hours for V-COCO.

truetone2022 commented 2 years ago

Thanks for your reply!I have trained your QPIC model on HICO-DET and VCOCO and all the setting is same with your released github code, but i encounter two problems. 

First, most of time the GPU utility rate is hanging around 0% , the program seems stuck at reading data, is this situation normal ?

Second, after training 10+ hours, only 10+ epochs finished, it seems to need 150+ hours to train 150 epochs on HICO-DET, is this situation normal ?  By the way, the same situation occurs on VCOCO.

Thanks for your helpful reply! Best wishes!

------------------ 原始邮件 ------------------ 发件人: "hitachi-rd-cv/qpic" @.>; 发送时间: 2021年7月13日(星期二) 下午2:26 @.>; @.**@.>; 主题: Re: [hitachi-rd-cv/qpic] how many hours does the qpic model need to be trained on HICO-DET and V-COCO respectively? (#20)

With the batch size of 2 and 8 V100 GPUs, it approximately takes about 38 hours for HICO-DET and 8 hours for V-COCO.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

dragen1860 commented 2 years ago

Thanks for your reply!I have trained your QPIC model on HICO-DET and VCOCO and all the setting is same with your released github code, but i encounter two problems.  First, most of time the GPU utility rate is hanging around 0% , the program seems stuck at reading data, is this situation normal ? Second, after training 10+ hours, only 10+ epochs finished, it seems to need 150+ hours to train 150 epochs on HICO-DET, is this situation normal ?  By the way, the same situation occurs on VCOCO. Thanks for your helpful reply! Best wishes! ------------------ 原始邮件 ------------------ 发件人: "hitachi-rd-cv/qpic" @.>; 发送时间: 2021年7月13日(星期二) 下午2:26 @.>; @.**@.>; 主题: Re: [hitachi-rd-cv/qpic] how many hours does the qpic model need to be trained on HICO-DET and V-COCO respectively? (#20) With the batch size of 2 and 8 V100 GPUs, it approximately takes about 38 hours for HICO-DET and 8 hours for V-COCO. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

maybe you should check your disk io performance.

truetone2022 commented 2 years ago

Thanks for your reply!I have trained your QPIC model on HICO-DET and VCOCO and all the setting is same with your released github code, but i encounter two problems.  First, most of time the GPU utility rate is hanging around 0% , the program seems stuck at reading data, is this situation normal ? Second, after training 10+ hours, only 10+ epochs finished, it seems to need 150+ hours to train 150 epochs on HICO-DET, is this situation normal ?  By the way, the same situation occurs on VCOCO. Thanks for your helpful reply! Best wishes! ------------------ 原始邮件 ------------------ 发件人: "hitachi-rd-cv/qpic" @.**>; 发送时间: 2021年7月13日(星期二) 下午2:26 @.**>; @.**@.**>; 主题: Re: [hitachi-rd-cv/qpic] how many hours does the qpic model need to be trained on HICO-DET and V-COCO respectively? (#20) With the batch size of 2 and 8 V100 GPUs, it approximately takes about 38 hours for HICO-DET and 8 hours for V-COCO. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

maybe you should check your disk io performance.

My disk io performance should be allright because i can train the similar work HoiTransformer code normally. So i'm confused where is the problem.

DavidHuji commented 2 years ago

How many workers are you using? Try to add "--num_workers 4" to check if it solves both the problem of slow training and the gpu utility (usually is the same problem).