Open hhhddddddd opened 5 months ago
Hi, thanks for using our code first! Sorry for the late reply.
For the dynamic dataset, the released default config trains for 800k iterations (defined in r4dv.yaml
with the epochs
parameter). It typically only requires 400k iterations (epochs=800) to converge. Another thing to note is that we test the training speed without evaluation (runner_cfg.eval_ep=800) and report training metrics only every 100 iterations (runner_cfg.log_interval=100) to reflect the real training time.
The same story goes for the static scene. It only takes 2-3k iterations to converge.
The iteration speed looks fine (60-70ms) though. I'm not sure about the cause for the two experiments showing up, the VRAM usage seems OK.
Another thing to do to speed up the training is to use our latest CUDA-backend implementation, you can enable it via this option: https://github.com/zju3dv/4K4D/blob/712eccb0e0eeef744c19eb221cfb424a2915b474/easyvolcap/models/samplers/r4dv_sampler.py#L43C18-L43C27
As for the training PSNR, the 0013_01 scene is the harder of all four for the DNA-Rendering dataset thus its training PSNR is slightly lower.
Hello, I have a strange problem with train time.
I executed
evc-train -c configs/exps/4k4d/4k4d_0013_01_r4.yaml,configs/specs/static.yaml,configs/specs/tiny.yaml exp_name=4k4d_0013_01_r4_static
on NVIDIA GeForce RTX 4090. But it takes me about 40 minutes to train single-frame.It's even more serious when I executed
evc-train -c configs/exps/4k4d/4k4d_0013_01_r4.yaml
. it takes me about 4 days to train all frames (NVIDIA GeForce RTX 4090). Moreover, I also observed a strange phenomenon during my training. When I ran a 4k4d training experiment on the 4090, thegpustat
command showed that there were two experiments running. (The same is true on 4090)In addition, the psnr of the training results of 4k4d_0013_01_r4_static also failed to reach about 30.
Can you give me any advice? Thank you so much for all your help!