examples on windows without development install (and a shell question)

zju3dv / EasyVolcap

[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research

Other

577 stars 41 forks source link

examples on windows without development install (and a shell question) #30

Open meeotch opened 2 months ago

meeotch commented 2 months ago

Sorry for the n00b questions... I'm trying to play around with EVC on Windows 10, using miniconda and git bash. I've successfully installed CUDA-enabled pytorch manually, and the non-development dependencies with pip install -e . evc-gui seems to run.

However, when I get to the Examples section, and run evc -c configs/exps/l3mhet/l3mhet_actor1_4_subseq.yaml, it fails after loading the images, because tinycudann isn't installed:

2024-04-04 19:39:33.523469 easyvolcap.utils.console_utils  console_utils.py:391
                           -> inner: Runtime exception: No
                           module named 'tinycudann'
+--------------------- Traceback (most recent call last) ---------------------+
| Z:\zod\splat-ez\EasyVolcap\easyvolcap\utils\console_utils.py:388 in inner   |
|                                                                             |
| > 388                 return func(*args, **kwargs)

As far as I can tell, tinycudann is part of the development dependencies, requiring CUDA Toolkit, which requires Visual Studio, etc. (I have read install.md)

Similarly, python scripts/fusion/volume_fusion.py -- -c configs/exps/l3mhet/l3mhet_actor1_4_subseq.yaml val_dataloader_cfg.dataset_cfg.ratio=0.15 fails to find open3d, which is another development dependency.

Is this correct? Are full development dependencies required to run the examples? (And if so, what is the purpose of the non-development install?)

Unrelated question: both of the above also fail at the end with: NoConsoleScreenBufferError: Found xterm, while expecting a Windows console. Maybe try to run this program using "winpty" or run it in cmd.exe instead. Or otherwise, in case of Cygwin, use the Python executable that is compiled for Cygwin.

I understand that this is a limitation of MINGW64 shells, like git bash. (Though evc-gui does run from git bash.) What shell are people successfully using under Windows? The stock cmd.exe is a nightmare - I much prefer a linux-like shell.

dendenxu commented 2 months ago

Hi @meeotch, thanks for using our code!

Your questions are excellent and indeed these two extra requirements defeat the purpose of a non-development install. The purpose of a non-devel install is for users to run evc-gui, which will open up a rendering GUI and support all methods that are only dependent on pytorch. However, our provided example that uses instant-ngp and gaussian splatting as baselines/backbones will require cuda compilation as you've suggested. In the meantime, the provided ENeRFi examples shouldn't require extra dependencies, could you try running that and check whether there are any issues? We will update the examples to include more pytorch-only baselines (ENeRFi etc.) and make it more clear on the extra dependencies of instant-ngp and gaussians.

Are full development dependencies required to run the examples?

At least the answer to this question is no. Aside from the minimal requirements, other dependencies are model-specific. For example, running instant-ngp based methods requires tinycudann, and running gaussian splatting based methods requires compiling their (or our modified) cuda rasterizer. And running NeRF+t or ENeRF shouldn't require extra dependencies.

What shell are people successfully using under Windows? The stock cmd.exe is a nightmare - I much prefer a linux-like shell.

The provided examples are all tested on powershell on Windows. I couldn't agree more that linux shells are much easier to use but I'm unsure whether you can start native a window in MINGW64 shells like git bash. Could you try with the enerfi example and check whether there's any warning on the screen about cuda-gl interop?

dendenxu commented 2 months ago

On another note, we've also supported WSL on windows. It's performance might be slower than running natively, but the shell experience sure gets much better : ]

meeotch commented 2 months ago

Thanks for the info. (I was wondering about performance under WSL, in fact. So knowing that it's slower saves me from having to test it.)

I got both the zju3dv and actor1_4_subseq examples working. (Though I did run out of VRAM during the 3DGS training step. I'm running on a 1080Ti that only has 11GB.) For the benefit of others, here are the dependencies I had to install, beyond the non-development ones:

pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/facebookresearch/pytorch3d
pip install lpips
pip install open3d
pip install plyfile
pip install pymcubes
pip install git+https://gitlab.inria.fr/bkerbl/simple-knn
pip install git+https://github.com/dendenxu/diff-gaussian-rasterization
pip install pytorch_msssim

Also, I had to add import mcubes to the voxel_reconstruction() function in fusion_utils.py, as it was failing to find it at line 38.

Some additional random questions:

The example says to remove colmap.yaml from the experiment config (gaussiant_zju3dv.yaml), "if you're not planning on using COLMAP's parameters directly". I assume this means if you're using the optimized camera parameters extracted from the l3mhet phase? (I also noticed that colmap.yaml is referenced in the dataset config itself: datasets/zju/zju3dv.yaml. Should it be removed from there as well, since the experiment config pulls in the dataset config?)
Is there any built-in way of interrupting training when running from the command-line, and then starting it again from the latest checkpoint?
I noticed that the gui looks similar to the instant-ngp gui, and that some (but not all) of the same keyboard shortcuts work. Is the gui documented anywhere? (Other than the code - I did find viewer_utils.py.)

Thanks for the help!

dendenxu commented 2 months ago

Thank you for your insightful observation. I'll update the documentation & example to reflect these hidden requirements.

As for the other questions:

Indeed this looks like an error, will fix.
The frequency of model saving can be tuned with runner_cfg.save_latest_ep and runner_cfg.save_ep, which indicates how many epochs to save the latest model and a permanent copy of the model to the disk respectively. To resume training, just rerun the previous training command and it will automatically pick up those checkpoints.
Indeed, this GUI draws inspiration from PlenOctree and instant-ngp's. The main GUI logic is written in volumetric_video_viewer.py where there's a VolumetricVideoViewer class. All methods starting with draw_ is the actual GUI part. And the keyboard shortcuts are defined in the glfw_key_callback function. I've always wanted to include a thorough doc about this (or better yet a screen overlay) but haven't found time yet.

dendenxu commented 2 months ago

As for the VRAM issue, you could first try disabling evaluation with lpips by passing runner_cfg.evaluator_cfg.compute_metrics='PSNR,SSIM' to the training command.

If the problem persists (occurs during training), we can always lower the rendering resolution size with dataloader_cfg.dataset_cfg.ratio=xxx.

A more advanced solution would be to tune the initial number of points by controlling the volume_fusion part.

meeotch commented 2 months ago

Thanks for the continued support. Here are a few other tweaks I've discovered, to get the examples working under Windows Git Bash:

run_colmap.py -> run_colmap() - I changed the mkdir operation from an os.system call to: os.makedirs(os.path.dirname(db), exist_ok=True) Pretty sure this is specific to git bash, which uses linux-like forward slashes, but I didn't test it extensively.
in unflatten_dataset.py - on line 69, changed intri/extri to intri_file & extri_file. Pretty sure this is a typo in the original script.

If I get free time at some point, I'll clean up my notes an post a comprehensive guide for Windows users here.

Some more noob questions... Sorry for packing so many topics in one issue, but it's probably cleaner than starting a new issue for each one:

I noticed that the output of 3DGS-T is a pytorch model (.pt file), rather than a (sequence of?) .ply splat files. Is this a necessary requirement of how the model works? Or is it possible to export .ply splats for further manipulation in other software?
The docs say "We might perform some clean-up of the point clouds and store them in the surfs folder." Viewing the surfs pointclouds vs. the original vhulls pointclouds in evc-gui, I'm seeing that the former are less dense, and also don't seem to have surface color, but instead a rainbow-grid pattern. (Could be default coloring, or some actual data, I'm not sure.) Is there any more information about the pointcloud clean-up operations, and what the expected input to the 3DGS training should look like for best results? (Running 3DGS on the uncleaned vhulls produces results that seem to have more artifacts than the example video.)

meeotch commented 2 months ago

Minor update: I'm trying to run the zju3dv configuration on a single frame extracted from the actor1_4 data. (So, static frame, 18 cameras.) The colmap step seemed to produce reasonable results. But the l3mhet step seems to run 5 epochs, then goes nuts - been waiting ~2 hours for the second 5 epochs, and it's still running...

I admit that I don't fully understand the difference between the zju3dv exp config and the actor1_4 exp config. Is there an exp config yaml for actor1_4 that is known to work on a single (static) frame of input from that dataset?