Slow training? - Githubissues

OiOchai commented 10 months ago

Hi,

Thanks for the amazing work.

I have this issue on super slow training speed. I am using a 3080. When the training just start the coarse training can reach ~10 it/s but afterwards it became 1 it/s, which is 10x slower and it took 10 hours for a 30 frame videos.. Any clue why this happens?

guanjunwu commented 10 months ago

Hi, are you using my latest code and set PipelineParams.debug=False? Meanwhile, what's the resolution of training images? Larger resolution and large number of Gaussians will also lead to slow rendering/training speed.

OiOchai commented 10 months ago

Hi, are you using my latest code and set PipelineParams.debug=False? Meanwhile, what's the resolution of training images? Larger resolution and large number of Gaussians will also lead to slow rendering/training speed.

Thanks for the quick response. I am using an old commit 5a80f11. Let me pull the latest code and see what happen then. And my training images is 1080p. I have 30 cameras, and very long sequence video. What do you think is the maximum frames per camera for 4D Gaussians?

guanjunwu commented 10 months ago

Thanks, I think this old commit can also work. what I want to note is that setting PipelineParams.debug=False is important (which will cause CPU overload). Do you check the memory usage? In Neu3D's dataset(present 'dynerf' in my paper),1352*1014 with 300 frames per video can also work. maybe you can also check number of initialized point clouds? so much point clouds will also cause the slow training even OOM Error

OiOchai commented 10 months ago

PipelineParams

I think they are False by default no? For the pc initialization, I was just using 2000 points. Let me dig more. Thanks!

OiOchai commented 10 months ago

Thanks, I think this old commit can also work. what I want to note is that setting PipelineParams.debug=False is important (which will cause CPU overload). Do you check the memory usage? In Neu3D's dataset(present 'dynerf' in my paper),1352*1014 with 300 frames per video can also work. maybe you can also check number of initialized point clouds? so much point clouds will also cause the slow training even OOM Error

I also have another question. Would appreciate it if you can answer

I currently have a dataset with similar structure with Neu3D data except that it does not use NDC and we have gt calibration for that. So I follow how you read Neu3D data but instead of using readdynerfInfo I modified the readCamerasFromTransforms to append m x n CameraInfo, where m is the num of cameras per frame and n is the number of frames.

However, I always find the console return 'Killed' in the data loading stage. So I have to reduce the number of frames to make it work. I cannot even load 1080p * 30 cameras in 100 frames. Have you had this issue? Is it normal?

guanjunwu commented 9 months ago

Hi, I dont think it's normal. Actually, In my original code, I set a dataloader to dynamic loading the training image. So if your data format is similar to Neu3D, the training process will begin fast. And you can use the colmap.sh to generate the point cloud in my latest version of the code. Btw, you can check the memory of your computer (such as watch -n 0.1 free -g) to supervise the data loading process.

hustvl / 4DGaussians

Slow training? #66