Open vvdb101 opened 1 year ago
This is an ongoing issue unfortunately with Colab. I it might have something to do with our rich console output system. Perhaps there's a limit to the rate of console output for Colab or something. From my experience, if you leave it training for a while the cell output eventually shows up, though this is also a bit spotty.
running on a linux command line should work fine, I think this is just an issue within Colab.
Ok, thanks for the context!
I am aware (e.g. through training nerfacto on the provided "poster" example dataset) that it takes a while before the training epochs are printed to the console, but when I try to debug my faulty code, the cell actually terminates after e.g. 8 seconds without any further output. From that I would think that it's not just a bandwidth issue, but not sure.
Anyways, thanks for the feedback.
Helpful observation: If you have Colab Pro, you can execute the commands in Colab's console and get more detailed error messages.
Hi guys. I'm having the same issue on colab. The training stopped after the code block run for a few seconds. I run the same ns-train
command in colab terminate and found this error:
...
Sending ping to the viewer Bridge Server...
Successfully connected.
Sending ping to the viewer Bridge Server...
Successfully connected.
[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
Disabled tensorboard/wandb event writers
Printing profiling stats, from longest to shortest duration in seconds
Traceback (most recent call last):
File "/usr/local/bin/ns-train", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/lib/python3.8/site-packages/scripts/train.py", line 247, in entrypoint
main(
File "/usr/local/lib/python3.8/site-packages/scripts/train.py", line 233, in main
launch(
File "/usr/local/lib/python3.8/site-packages/scripts/train.py", line 172, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/usr/local/lib/python3.8/site-packages/scripts/train.py", line 87, in train_loop
trainer.train()
File "/usr/local/lib/python3.8/site-packages/nerfstudio/engine/trainer.py", line 203, in train
self._init_viewer_state()
File "/usr/local/lib/python3.8/site-packages/nerfstudio/utils/decorators.py", line 58, in wrapper
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/nerfstudio/engine/trainer.py", line 287, in _init_viewer_state
self.viewer_state.init_scene(
File "/usr/local/lib/python3.8/site-packages/nerfstudio/utils/decorators.py", line 82, in wrapper
ret = func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/nerfstudio/viewer/server/viewer_utils.py", line 357, in init_scene
self.vis["renderingState/export_path"].write(timestamp_match[-1])
IndexError: list index out of range
I believe this line of code is causing the issue: https://github.com/nerfstudio-project/nerfstudio/blob/9dc7fc2e44e8bebbe09984d815dd9e6501a6ee63/nerfstudio/viewer/server/viewer_utils.py#LL357C39-L357C39
And it can be fixed simply by checking the timestamp_match
len first:
+ if timestamp_match:
self.vis["renderingState/export_path"].write(timestamp_match[-1])
This is an ongoing issue unfortunately with Colab. I it might have something to do with our rich console output system. Perhaps there's a limit to the rate of console output for Colab or something. From my experience, if you leave it training for a while the cell output eventually shows up, though this is also a bit spotty.
running on a linux command line should work fine, I think this is just an issue within Colab.
This is exactly right - the issue is that python3 currently has a bug with the terminal window size, which effects the output of the rich console, which actually occurs in other libraries as well. Here is an example: https://github.com/rsalmei/alive-progress/issues/157
The issue will be fixed when Colab upgrades to Python 3.11 in April 2024 - the change was not backported to python 3.10 (which colab currently runs) as it was deemed too minor, but it effects ns-train
as it will not be able to train our nerfs. More on this is described here: https://github.com/python/cpython/issues/86340
How exactly does debugging in nerfstudio work, particularly in Google Colab?
Specifically, I currently want to test my dataparser for a new dataset, which I implemented following the nerfstudio Developer Guides, including the necessary changes to configs and such. When I start training via the ns-train command, training runs for only a few seconds before stopping without any meaningful error messages (see screenshot below). A similar outcome is given when using the test_train.py script. Also, I can't print anything to the console with print() as a last resort, at least not downstream from the training_loop call.
I believe this has something to do with the tyro CLI setup, which I am not awfully familiar with. I am sure I am missing something obvious, but can't quite figure out how to get meaningful debugging output in Google Colab. Unfortunately, since I don't have any other access to GPUs, I am relying on the colab setup. Also, I am not sure if running the training on e.g. Linux via the terminal would yield a different output.
Any pointers would be hugely appreciated!
Screenshot A (running ns-train)
Screenshot B (running scripts/test_train.py):
P.S. Not sure if this issue section is the right place to ask this. I also tried joining your Discord, but the link seems to be invalid.