Open YuhsiHu opened 3 months ago
If your machine restarts, this suggests a hardware issue or other external issue on your end. There is nothing in Colmap or Nerfstudio (or any other non-kernelspace program for that matter) that would be able to cause a machine restart under normal conditions (short of Nvidia driver bugs or completely running out of RAM or disk space, but that seems unlikely here). Given the triggering conditions, are you sure your power supply is up to the task, and your thermals are fine? Running these kinds of workloads puts your system (particularly your GPU, and the 4090 is an extremely power-hungry beast) under a lot of strain, and while most power supplies are designed to handle short spikes above their specification, sustained power draw above what they're rated for will trigger failsafes causing a power cut or restart. Similarly, other hardware/thermal errors will also trigger failsafes causing restarts when things become too extreme.
No regular software should ever be able to cause a crash so badly that your system restarts unless there are hardware or kernel driver issues involved, and the latter seem unlikely here.
Thank you for your reply. This new machine has been used in recent months to process MVS, NeRF, and Gaussian Splatting programs, and this is the first time I have encountered this problem.
The program can run normally in the following cases:
Based on the above situation and my observations when the program was running (only a small part of the CPU and GPU were used), I think there may be some errors when writing files after BA, and a certain resource was instantly occupied, causing a restart.
I will try to process this dataset on other machines and update the progress.
Describe the bug When using nerfstudio to process image data, the computer will restart directly during the bundle adjustment stage. It doesn't happen when the number of pictures is small, but it does happen with a large number of pictures (around 400 pics, 4000*6000).
Hardware GPU: 4090, 24G Mem: 126G Swap: 2G
To Reproduce Steps to reproduce the behavior:
Expected behavior After completing BA, we get the sparse reconstruction result.
Additional context