napari / napari

napari: a fast, interactive, multi-dimensional image viewer for python
https://napari.org
BSD 3-Clause "New" or "Revised" License
2.16k stars 419 forks source link

get_status for surface (and probably other layers) should be monitored, debounced/throttled, and run on a separate thread #5755

Closed jni closed 2 weeks ago

jni commented 1 year ago

🧰 Task

In this discussion, @brisvag and @aganders3 figured out that the texture example had low framerate not because of VisPy and OpenGL, but because of layer.get_status. The simple workaround for now is to not have the layer selected when interacting with it, but ideally we should:

  1. Make sure that get_status runs in a separate thread so it doesn't interfere with the GUI responsiveness
  2. Monitor how long a get_status call takes, and
  3. Use debounce and/or throttle (probably debounce) to make sure it doesn't get called more than that.

I prefer debounce, at least for long-running get_status, because if get_status takes a while to run, then throttling it will only show an outdated status message anyway, defeating the purpose of it and wasting precious CPU cycles.

Czaki commented 1 year ago

https://github.com/napari/napari/blob/ac8266032d3aa8e64cc2e33fb297269a8c1983ae/napari/_qt/qt_main_window.py#L175-L185

But it is already throttled.

jni commented 1 year ago

yeah but to 50ms. If the process is slower than that, you overwhelm the GUI. Which is the case in the surface layer.

Czaki commented 1 year ago

Maybe surface code could be optimized? Or increase this time to 100ms?

I agree with @brisvag that requiring a stop mouse to update the status is a bad decision.

aganders3 commented 1 year ago

Some quick profiling on my laptop shows get_value for the model in the example takes around 170-200 ms. I was able to get it down to ~150 ms with some micro-optimizations but I don't see an easy way to a 3x improvement right now. It's not a major surprise where most of the time is spent:

geometry.py:17(project_points_onto_plane)
geometry.py:358(inside_triangles)
geometry.py:608(line_in_triangles_3d)
brisvag commented 1 year ago
  1. Make sure that get_status runs in a separate thread so it doesn't interfere with the GUI responsiveness

  2. Monitor how long a get_status call takes, and

  3. Use debounce and/or throttle (probably debounce) to make sure it doesn't get called more than that.

I prefer debounce, at least for long-running get_status, because if get_status takes a while to run, then throttling it will only show an outdated status message anyway, defeating the purpose of it and wasting precious CPU cycles.

I think 1. is the best (and probably easiest) way to handle this, unless we can simply optimize the code so well it becomes unnecessary. Still useful to do IMO.

Not sure how I feel about debounce vs throttle, I guess they both are outdated so debounce at least is not wasting resources. Either way, (if we go the threaded approach) we should probably signal somehow that the value is not currently up to date (similar to how we're working on the loading icon for layers and the new async).

Czaki commented 1 year ago

@aganders3 How did you test it? I got:

%timeit viewer.layers[0].get_status((1.6522035388587697, 0.9330241210928516, 3.502586273276682))
23.2 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Is my machine so strong, or did I test it incorrectly?

aganders3 commented 1 year ago

Try something like this, I think it's how it's called when mousing over the surface:

%timeit viewer.layers[0].get_value((2.4136927335421827, 3.774940485066061, 4.213935782687853), view_direction=[-0.96592583,  0.25881905, -0.], dims_displayed=[0, 1, 2], world=True)
203 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Czaki commented 1 year ago

Ok. I'm able to reproduce the problem. The fast review suggests that code could be speedup by more matrix multiplication, but It requires a few hours of investigation in the algorithm.

But it looks like it may require the use of a different algorithm to find right triangles. Ray tracing is often used in games, so there should be faster algorithms.

aganders3 commented 1 year ago

I'm not an expert here, but I believe one of the largest gains for ray-tracing complex scenes is to put the objects (triangles) into a tree structure (bounding volume hierarchy (BVH) or kd-tree). I think this is the only way to get better-than-linear performance with respect to the number of triangles. I don't have a feel for how slow it might be to generate a BVH, but may experiment with this. I suppose it bodes well that our scene is generally not rapidly changing.

Another method maybe worth considering is to render the values into another buffer, and use glReadPixels(). I think this would be a change in vispy.

Czaki commented 1 year ago

In general, we should implement one of such structures for layers like Points or Surface. It will speed up multiple operations.

But this is long term.

brisvag commented 1 year ago

Another method maybe worth considering is to render the values into another buffer, and use glReadPixels(). I think this would be a change in vispy.

This already exists in vispy ("picking mode"), and we could/should probably use it for all get_value tasks; however, I never wrapped my head around it and I'm not sure how easy it is to make it work in napari.

EDIT: actually, maybe picking mode is only capable of returning "which visual" isbeing rendered, rather than "which vertex".

aganders3 commented 1 year ago

I hacked together a little prototype of a face-picking mode in vispy. On my laptop (M1 Pro) I get ~5-12 ms to pick, highlight, and re-render the mesh. The way this is done now actually wouldn't require changes to vispy, but it seems promising and I think it can be improved (especially w.r.t. GPU memory usage) by a change upstream.

highlighting faces based on mouse location

Screenshot 2023-05-16 at 9 28 35 AM

picking mode under the hood - faces coded by color

Screenshot 2023-05-16 at 9 28 23 AM

brisvag commented 1 year ago

Awesome, this is what I was hoping was possible!! :heart_eyes: totally support upstreaming to vispy, happy to help with that if needed :)

Questions:

aganders3 commented 1 year ago
  • are you doing this continuously on hover, or sporadically on click?

Either works - I was registering with either on_mouse_move or on_mouse_press events in vispy. I don't think vispy has the same throttling options as napari, so there are some issues with on_mouse_move as it triggers repeatedly even if you're moving within the same face.

  • is it doable to get that info out from opengl, or is everythign happening internally in the shaders?

I get the info out of opengl by rendering into a numpy array without displaying it. Then I use the mouse position to index into that array and get the color, which encodes the face index in the array of faces.

Right now the hack is that the picking mesh is a whole separate mesh with the face colors calculated on CPU, so it duplicates the buffers sent to the GPU. This is what I meant by improving with some upstream code as I think the face colors could be calculated on the fly in the shader instead (saving memory and probably time). I think this can be done with gl_VertexID or gl_PrimitiveID or something?

  • can we do the same with points?

I have only given this a little thought, but I think something similar would work.

  • several small faces are highlighted in green in the first surface. Is that a color collision problem? Or am I misunderstanding the image?

Sorry I should have explained that image better. This is from a demo I made where the mesh is all purple to start, then if you hover (or click) a face it turns green (and stays green). The very small green triangles are from moving the mouse quickly over the mesh in the background where triangles are small.

akuten1298 commented 1 year ago

Hi, I was working on the thread approach stated up above and could see considerable difference in terms of the surface being rendered on canvas during cursor movement and also in terms of user experience. Though this change attached below does not reduce the total computation time as of now, it specifically tries to address the decoupling of viewer and get_status computations making them concurrent instead. Please let me know your thoughts on this approach and also if we need to consider thinking about adaptive throttling or debouncing on top of this?

Would love to get some early feedback on approach, implementation or discussion here: https://github.com/akuten1298/napari/pull/4

Also attached in PR is the before vs after user experience recording.

Thank you :)

brisvag commented 1 year ago

Hi @akuten1298, thanks for picking this up, awesome showcase! I'm definitely in favour of doing threading for this.

I'd like to hope that it's as simple as https://github.com/akuten1298/napari/pull/4, but I have a creeping fear that the threading police (@andy-sweet and @Czaki expecially :P) will have some worries about sharing of resourced between threads, race conditions and all that fun stuff ^^'. Thing is, get_status has to access layer data and a lot of other sensitve/changeable state, so we probably need some locks and similar tricks. But I'm sure @andy-sweet and @Czaki can be of more help here.

aganders3 commented 1 year ago

Good points @brisvag - maybe we can benefit from some of @andy-sweet's async slicing work 🙂.

akuten1298 commented 1 year ago

Thank you @brisvag for the early comments. Will make sure to check on this and get back on this thread with further updates regarding the threading approach.

aganders3 commented 1 year ago

@brisvag totally support upstreaming to vispy, happy to help with that if needed :)

I have opened a PR for "face picking" at https://github.com/vispy/vispy/pull/2500 (nice PR number!) if you have time to take a look.

akuten1298 commented 1 year ago

Hi, I was working on porting over Ashley's work on vispy face_filter_picking onto Napari and have created a draft PR for an initial review from the community. Once again, we were able to significantly bring down the status bar computation time but we might have to still decide which approach we would like to proceed with. We have compared the runtimes independently for both approaches but might still need to have a common approach/script to compare them quickly and easily.

BVH PR: https://github.com/akuten1298/napari/pull/7

Face Picking PR: https://github.com/akuten1298/napari/pull/10

Please take a look and suggest feedback when you find time, thank you :)

jni commented 1 year ago

@aganders3

at https://github.com/vispy/vispy/pull/2500 (nice PR number!)

it's nice, but I suspect it will take some time still for someone to beat my scikit-image/scikit-image#3000. 😃 (For those that don't know, Python 3 used to be called Python 3000.)

psobolewskiPhD commented 3 months ago

Gentle bump here -- any of those draft PRs able to get over the line?

psobolewskiPhD commented 1 month ago

@aganders3 Any thoughts how what to do with those draft PRs? They don't look far off from making a big impact?

aganders3 commented 1 month ago

Hm, I can take another look soon (vacation this week). I think the performance of the GPU version is closer than the BVH, but implementation is a bit awkward.

Even the simple multi-threaded version was a big UX improvement so maybe we should investigate that a little more?

jni commented 1 month ago

Even the simple multi-threaded version was a big UX improvement so maybe we should investigate that a little more?

yes, just like we have a thread dedicated to slicing we could have a thread dedicated to picker updates. I imagine that a lot of the slicing machinery (e.g. discarding stale requests) can be reused here.

brisvag commented 1 month ago

yes, just like we have a thread dedicated to slicing we could have a thread dedicated to picker updates. I imagine that a lot of the slicing machinery (e.g. discarding stale requests) can be reused here.

Yeah I think this would be great!