A way to multithread or multiprocess ray intersection.

bsavery commented 4 years ago

This is probably trying to push the python interpreter past where it is designed but here's the problem.

If tracing many rays (like in a pathtracer). Calling rtcIntersect 1 ray at a time is inefficient and only single threaded. Possible solutions I see are:

Batch rays and call rtcIntersectM but my understanding is this is not multithreaded (https://software.intel.com/en-us/forums/embree-photo-realistic-ray-tracing-kernels/topic/520383)
Fix embree device to allow multiprocessing (python doesn't allow multiple threads to execute concurrently). Right now the embree device can't be pickled for multiprocessing
Wrap some call to do multiple intersect calls inside cython
Add a way to use asyncio. This would allow other threads to do shading or other work while the intersect is happening.

sampotter commented 4 years ago

My goal for this wrapper is to keep it very simple and do as little as possible beyond what Embree does itself. The only "extra feature" I want this wrapper to provide is easy compatibility with numpy (for obvious reasons!).

That said, I think #2 is the only viable approach. The other options are things that user could experiment with themselves.

Adding pickling hopefully won't be too complicated. Cython describes its approach to automatically generating a __reduce__ method in their docs on extension types. The news is not so good. ;-) If a class has a __cinit__ method, it's necessary to write a __reduce__ method by hand.

So, we will need to go through and make sure that all of the extension classes can be pickled.

bsavery commented 4 years ago

Yeah, I played around with it a bit and couldn't come up with a great option. That's why I bring it up.

sampotter commented 3 years ago

Had a chance to look into this more.

There are two issues over on the Embree repo which discuss serialization of Embree's BVH:

https://github.com/embree/embree/issues/238 https://github.com/embree/embree/issues/137

It looks like it isn't possible to serialize Embree's BVH and Embree's developers aren't planning on making this possible any time soon. On the other hand, they say that in most cases, just rebuilding the BVH from the geometry is fast enough that just serializing geometry is the way to go.

Our use case is a little different in that, for parallelism, we'll potentially want to serialize and deserialize classes defined in embree.pyx a large number of times, so ideally this would be fast.

I'm going to try implementing __getstate__ and __setstate__ pairs for each class in embree.pyx and see how this goes. Will update here once I've done so.

sampotter commented 3 years ago

OK, @bsavery, I spent some more time digging into this. My conclusion is that it would be too much work and would go outside the scope of this library to try to directly parallelize the extension classes provided by embree.pyx. In order to do this, you would need to record the history of all of the different calls and store extra vertex and index data with these classes to be able to implement __reduce__ correctly, reproducing each of the Embree API calls in the correct order.

That said, it should be very easy to get the behavior you want. In my library python-flux, the class flux.shape.TrimeshShapeModel (in this file) wraps a triangle mesh created from a set of vertices and indices to do raytracing using python-embree. Because it stores these vertices and indices as member variables, it's straightforward to implement a __reduce__ method. I recommend creating a class similar to TrimeshShapeModel and implement __reduce__ yourself.

aluo-x commented 3 years ago

Thanks for the fantastic package and the example.

I think extreme caution needs to be used when utilizing this package for scientific purposes. I observed a lot of numerical instability, and significant variance in the results between runs, even with the robust flag set.

This is not a flaw of the python-embree wrapper, but I think inherent in embree itself. My use case is multi-hit along a single ray. Embree2 (using the pyembree) was numerically correct for the first two hits, Embree3 (using python-embree) seems to be correct up only the first hit. I've attempted the tnear=tfar technique, as well as offsetting the origin, both seem to give incorrect results when the offset is vectorized (a vector instead of a constant). Perhaps the dtype is incorrect?

Edit: typo

sampotter / python-embree

A way to multithread or multiprocess ray intersection. #4