Open yenchenlin opened 4 years ago
As I commented in that thread, I can only get up to 8.5it/s (without his caching strategy). I didn't test further. He mentioned caching gave him +4 it/s, maybe that is the trick? But in my implementation data loading is not bottleneck at all (it's about 2e-4 sec to fetch a batch) so I didn't try it.
Hi @yenchenlin
Missed out on this issue. I haven't been able to follow up on that thread as I haven't been able to run additional experiments since. For the config though, it was PyTorch 1.4, CUDA 10.1, python 3.6. I was running on a GPU-cluster, and in particular, on a node that had a V100 GPU.
As for the config, I took care to replicate the exact lego
config file (network of 8 layers, 128 fine samples per ray, and the like). And for speed comparison, I use the time taken until the optimizer parameter update (and not the tqdm loop reported times, which include tensorboard logging, etc.)
Is there a specific config file and command to reproduce that? thanks a lot!
May I know the exact config (hyperparameters) to get an average speed of 14.2 it/s (on an RTX2080Ti, PyTorch 1.4.0, CUDA 9.2) reported here?
I couldn't get it by simply following modifications in #6. (cc @kwea123, did you test it further?)
Thanks in advance!