logging training steps not working

GateraGael commented 6 months ago

Logging the training results to a file doesn't appear to be working, I specified the (--logging.local-writer.enable & --logging.relative-log-dir) flags but got nothing.

To Reproduce

ns-train nerfacto --data ./ --viewer.websocket-port 7007 --output-dir ./train_outputs/ --logging.local-writer.enable True --logging.relative-log-dir ./train_outputs/viewer_log_filename.txt

Expected behavior A viewer_log_filename.txt in the ./train_outputs/ folder with every training iteration step logged to a file.

Additional context I am running the training in a docker container from image dromni/nerfstudio:0.3.4, I've also tried with image dromni/nerfstudio:0.3.2 and but nothing.

kerrj commented 6 months ago

Have you tried on a more recent version of nerfstudio? 0.3.4 is pretty old so it's hard to debug

GateraGael commented 6 months ago

The following error message shows up when running newer images with tags starting with 1.0.*.

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/d71dbab3539dec238abf12c6ddb20745909843a5caaac0df46bea81ee81f95cf/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

It seems that there is a solution to this for nvidia-container-toolkit on WSL2 Issues with installing the Docker container in new version 1.0.1, myself I am using Docker Desktop so I am not sure if that solution works for me yet. I don't understand why these newer images were pushed to Dockerhub while these issues persit.

paolovic commented 5 months ago

same here

ns-train splatfacto --load-checkpoint outputs/room2/splatfacto/2024-04-22_141534/nerfstudio_models/step-000029999.ckpt --data processed_data/room2/ --max-num-iterations 10000 --logging.local-writer.enable True --logging.relative-log-dir logs

no logs will be created and I am using NerfStudio 1.0.2

nerfstudio-project / nerfstudio

logging training steps not working #3083