nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

Hi , it's a great work. On your paper you show the time training on one v100 in Flickr30k is more than 2000 hours. It is a large number. Then how long will the modal take to train on the 3D grounding dataset. #4

Closed 618QRC closed 1 year ago

nickgkan commented 1 year ago

Depends on your hardware. With our hyperparameters on an NVIDIA A100 it takes approximately one day for SR3D, up to two days for NR3D and ScanRefer. If you train on a V100 you can expect similar times or slightly more.