How could I train 4D-Rotor-GS in the Neu3D dataset if memory is not enough?

weify627 / 4D-Rotor-Gaussians

Apache License 2.0

79 stars 5 forks source link

How could I train 4D-Rotor-GS in the Neu3D dataset if memory is not enough? #14

Open guanjunwu opened 2 weeks ago

guanjunwu commented 2 weeks ago

Dear authors, thanks for your great work!

My machine only has 128GB memory, and I want to train your 4DRotorGS in the neu3D dataset. But I found that my memory is not enough and the scripts always be killed .

train_dynerf.sh: line 2: 3197021 Killed                  ns-train splatfacto-big --data $data_root/N3V/cook_spinach --pipeline.model.path $data_root/N3V/cook_spinach

Do you have any solution or dataloader?

Best, Guanjun

weify627 commented 1 week ago

Dear Guanjun,

Thank you for your question! The easiest solution would be to use a machine with larger memory (e.g., 200GB should be definitely enough). If this is not possible, maybe you could try to (1) store images as uint8 in the memory (it by default stores float32), (2) and then convert it to float32 before running each batch.

More specifically, to achieve, you might consider (1) specify the image_type as "uint8" here, i.e., change it to data = self.get_data(image_idx, image_type="uint8") (2) convert the batch variable to "float32" here, i.e., add one more line after L290 as: batch = batch.float() / 255.0

I hope this can give you some useful hints on modifying the dataloader for less memory. I haven't verified this code myself yet, but let me know if you have any further issues. Thanks again for your question!

Best, Fangyin

guanjunwu commented 6 days ago

Dear Fanyin, Thanks for your help, now I can train the model based on the modification. But I cannot render images, it seems like the scripts loaded train images twice during rendering. How could I solve the problem? it seems like we can skip load training images during rendering test images.

(4drotorgs) ➜  4D-Rotor-Gaussians git:(main) ✗ ns-render dataset --load_config outputs/cook_spinach/splatfacto/2024-09-12_151829/config.yml --output-path outputs/cook_spinach/splatfacto/2024-09-12_151829/ --split test

[20:35:31] Caching / undistorting train images                                            full_images_datamanager.py:211
[20:40:10] Caching / undistorting eval images                                             full_images_datamanager.py:224
Loading latest checkpoint from load_dir
✅ Done loading checkpoint from outputs/cook_spinach/splatfacto/2024-09-12_151829/nerfstudio_models/step-000019999.ckpt
[20:41:49] Caching / undistorting train images                                            full_images_datamanager.py:211
[1]    1389880 killed     ns-render dataset --load_config  --output-path  --split test

Just forgive me that I'm not familiar with nerfstudio. I would appreciate it if you could help me.

guanjunwu commented 3 days ago

It seems that it's hard to replicate the results on my machine which only has 128GB of memory... CPU load average is overloaded

Dafamubarok20 commented 19 hours ago

Any reason why you cannot load the data from the drive as the training progresses?