Closed bitxue closed 9 years ago
The program stoped at this line: //PlyLoader.cpp void PlyLoader::processBlock(uint32_t _data, int x, int y, int z, int w, int h, int d) { ... memset(_counts, 0, elemCount_sizeof(uint16_t)); // stoped here ... }
and pop-up this information: Unhandled exception at 0x000007FEE20BC9E7 (msvcr120.dll) in VoxelBuilder.exe: 0xC0000005: Access violation writing location 0x0000000055E11000.
Okay, I just pushed a fix. Can you confirm that it works?
Just FYI, I also pushed a range of fixes for compilation errors and warnings under MSVC, that might cause a merge conflict on your local copy.
It works well with the lutMemory and dataMemory size (1024 * 1024 * 1024) after i update to the latest code. But it won't work with a larger lutMemory and dataMemory size (2048 * 1024 * 1024), the program seems to have entered an infinite loop.
This works fine on my machine.
Note that the expression 204810241024 causes an integer overflow, since integer literals in C/++ are usually 32 bit (I believe MSVC also gives out a warning). You need to make sure the expression does not overflow before being stored in a size_t.
Something like size_t(2048)10241024 works for me.
I will try again later and report the results. Thank you so much!
Hi, Benedikt,
You are right. There is indeed an integral constant overflow caused by the expression 2048_1024_1024. Sorry for that I did not notice this. It works fine on my machine now.
I also notice that it takes longer time to generate voxels with the same resolution using a larger memory size. Take the xyzrgb_dragon model as example, to generate a 1024 resolution voxels, it takes 45s with 2048 * 1024 * 1024 memory size, while it takes 28s with 1024 * 1024 * 1024 memory size. It seems that for low or medium resolution, a small memory size is better. Is that true?
Good point. I think this is mainly because it memsets some redundant things. I just pushed a fix, hopefully that should speed things up.
Also, just to note: the data memory is used to cache cubical voxel blocks with power-of-two side lengths. A side effect of that is that more data memory only improves the performance every 8-fold increase. So, for example, to cache a block of size 256^3, the program uses 288MB of data memory. To cache the next largest block of 512^3, it needs 8 times that, i.e. 2304MB. If you supply 2048MB, this is a little bit short, and the program falls back to 256^3.
With the newest patch, the program should hopefully not be slower if you supply 2048MB than if you supply 288MB, but to make it faster you need to bump the 2048 to 2304MB, so it can fit a larger block in memory. For the next speed increase it would need more than 18GB, which is probably outside of what any reasonable machine has :) In other words, increasing the data memory beyond 2304MB is not going to have any effect.
Hi, Benedikt,
Actually we have a HP Z820 workstation with 128GB system memory. We want to visualize an entire airplane model by ray casting an super high resolution (might be 32768^3 or 65536^3) SVO. Currently I am using your project to handle this but i am not sure whether it is capable to handle. I guess there will be two issues: 1). building speed, it may cost a couple of days. 2) the storage size may be very large, which may need more compact compression schemes. Currently i was testing to generate SVO at 32768^3 resolution. Do you have some suggestions about this? or ideas to improve the project toward higher resolution and performance?
Interesting!
At these large resolutions, I think there's 2 issues: 1) The triangle-to-voxel conversion code suffers from performance degradation when used at very high resolutions. This can be fixed by using a smarter acceleration structure. 2) Although the voxel conversion code can deal with 64bit addresses, the final SVO itself can only address 4GB before it gives up. This is because child pointers are stored in relative addressing with at most 15bit precision, and addresses larger than that overflow to a global "far pointer" table with 32bit precision. If an address exceeds 32bit, the SVO cannot address it. I will have to run it on a few very large models and see what the octree size looks like. I might have to rewrite parts of it to combine relative- and far pointers to extend the pointer size to 47bit.
I will have to think about this, but I'll probably rewrite the conversion code to resolve 1). 2) requires testing, but I have some high resolution models I can run it on. I'm pretty busy with other projects, but hopefully I have some time over the weekend to look at this.
This code base is pretty old now, but if I'm at it I could also convert it to a CMake project and incorporate C++11 features. Do you have a C++11 capable compiler for your project (MSVC 2013 works) and are you familiar with CMake?
I am so appreciate to know that you may have time to try to solve the problems you raise.
Yes, I am exactly using MSVC 2013 now and i am quite familiar with CMake. I also have some large models, e.g. the UNC PowerPlant model and the Lucy model, which is publicly available for download. I think i can participate in testing the new code.
Just to give a quick update, I've converted the project to C++11 and CMake and started doing benchmarking and performance optimization. The triangle to octree conversion is now more than 8x faster on my machine, and I might be able to improve it further.
I will do some more performance improvements and testing and hopefully push the code tomorrow. It's a larger rewrite and the new version might be less stable initially, so some testing is required.
Wow, well done! I will test the new code asap it is pushed.
BTW, will the large resolution (like 32768^3 or 65536^3) be supported in this update?
I managed to do more performance improvements today, and I think it will be ready to push tomorrow (still need to do some cleanup). The speedup is quite sizable, although I will do proper benchmarking first.
I was hoping to resolve the issue with octree sizes as well, but it's more complicated than I thought. Unfortunately this is a problem with top-down construction - the octree does not know that it needs to use larger pointers until it's too late to embed them into the data structure. One solution is to extend the descriptors to 64bit always, although this will double the octree size and may affect raymarching performance. I have an idea using chunked memory allocators that won't build the contiguous memory block until the end, allowing to inject data after construction has finished. It will take some more time to code though. I'll keep you updated.
Ok, it's pushed!
Notable performance changes:
For octree size:
For usability:
I also replaced the Makefile with a CMake project and converted the code to use C++11 features. I updated the Readme with build instructions.
Before, creating a 8192^3 voxel volume from 500k triangles took more than 40 minutes on my machine. Now it only takes 105 seconds (more than 20x speedup). Also, creating a voxel volume with resolution 32768^3 from a mesh with 7 million triangles now only takes 25 minutes on my machine (the resulting octree is 8GB large).
The nice thing about the new code is that the runtime only depends on the number of non-zero voxels, not the total number of voxels. In other words, the runtime increases quadratically, not to the third power, i.e. doubling the resolution only makes the conversion time increase by 4x. The octree size also scales quadratically, so on my machine a 64k voxel volume would take 2 hours to produce and would take up 32GB of space (I don't have that much memory, so I can't test).
I've tested the new code with two models.
1) a mesh 5 million triangles, at resolution 32768^3, the resulting octree is 24GB large. (about 8 hours) 2) a mesh 20 million triangles, at resolution 65536^3, the resulting octree is 58GB large. (about 24 hours)
I lost the log file, so the building time above are approximate values.
Next, i will test a model with 350 million triangles at resolution 65536^3 (it may cost several days to build...). Also, the model contains thousands of .obj files, so first i need to add some code to load these .obj files. I will report the results if i can manage to do this.
Thanks for the awesome work!
I'm glad to hear that it works!
I just pushed another update that enables multi-threaded triangle conversion. For large triangle meshes (tested with 7 million tris) this made conversion ~50% faster.
I also implemented octree compression when saving/loading. It's using the LZ4 compression library, which can achieve extremely fast decoding/encoding rates (~2GB/s when decompressing) while yielding reasonable compression ratios. For a 32k octree, this reduced the file size from 8.1 GB to 2.7 GB for me, roughly ~3x reduction (this heavily depends on the model though). The in-memory size is unchanged, but the size on disk will be reduced. Notably, this actually reduced loading times for me, since the decompression takes less time than loading the uncompressed file to memory, at least for large octrees.
Is it possible to employ the block-based compression scheme as described in ESVO? It seems more compact for in-memory size.
I think it's possible, but there's two problems:
Currently I also don't have spare cycles for this project, so I'm going to put this on the backlog.
Unrelated to your question, this thread has evolved away a bit from the original issue, which has been resolved a while ago. I'm going to close this issue since it's been resolved, but feel free to open new issues for any questions/problems you have.
Hi, Benedikt,
Just to report to you a strange problem encountered when i used a larger lutMemory and dataMemory to generate voxels of different resolutions.
My computer configuration: 10GB memory, Win7 64 bit and the program is compiled with vs2013 with 64 bit release configuration (with the latest code). tested model: xyzrgb_dragon
I have used a lutMemory and dataMemory size of double the default value. static const size_t lutMemory = 2 * 512 * 1024 * 1024; static const size_t dataMemory = 2 * 512 * 1024 * 1024;
The test results is as below: resolution result 256 [ok] 512 [ok] 1024 [crashed] 2048 [crashed]
The problem can be easily reproduced. can you help us to see what is wrong there? Thanks in advance!
Best, Junjie