Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74
stars
11
forks
source link
Hi , it's a great work. On your paper you show the time training on one v100 in Flickr30k is more than 2000 hours. It is a large number. Then how long will the modal take to train on the 3D grounding dataset. #4
Depends on your hardware. With our hyperparameters on an NVIDIA A100 it takes approximately one day for SR3D, up to two days for NR3D and ScanRefer. If you train on a V100 you can expect similar times or slightly more.
Depends on your hardware. With our hyperparameters on an NVIDIA A100 it takes approximately one day for SR3D, up to two days for NR3D and ScanRefer. If you train on a V100 you can expect similar times or slightly more.