ocornut / imgui

Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies
MIT License
59.7k stars 10.16k forks source link

Rendering optimization: store rounded corners in texture to use 1 quad per corner #1962

Open franciscod opened 6 years ago

franciscod commented 6 years ago

There is a branch tracking checking the current state of this feature: https://github.com/ocornut/imgui/commits/features/tex_round_corners Most of the remaining work has been done by @ShironekoBen, and many cleanups by @ocornut

Original text follows:


I want to work on this! Rough roadmap / questions:

Feedback welcome :)

ocornut commented 6 years ago

Hello! Thanks for your interest, working on that would be nice!

4 corners or just 1 and pick UVs carefully for rotation?

I would guess just 1 is enough if possible, but AddCircle/AddCircleFilled might not able to to function with four fully symmetrical quarters? I'm not sure, this needs testing.

franciscod commented 6 years ago

Thanks for your input, Omar! Some more questions:

There is an api for registering custom rectangles in the atlas that you can write pixels into (see what we are doing with mouse cursors).

Yep, AddCustomRectRegular, right? I've used that one.

all integer sizes from 1 to an arbitrary MaxSize which is possible to override with a member in the atlas

Sounds good!

The anti-aliased corners should be programmatically rasterized (which is easy for circles).

Easy as in "add a 50% alpha right next to the circle" or "there's a nice way to do this"? Any pointers? Never did raster/software AA myself and might save some time :) I'd love if it looked nice on every backend (i.e. does the AA change between OpenGL/DX/Vulkan?)

I've thought of adding some kind of "render to texture" support but thinking on all the backends was a no-no. Also ImGui should stay backend agnostic.

the drawing of custom shapes are getting in the way of being able to draw clipped contents

Care to explain this a little more? I don't quite see the relation here.

it may be good to have a ImFontAtlasFlags to forcefully rasterize the circles with no anti-aliased. We can figure that out later.

Great plan!

One possible thing to consider (but feel free to ignore if this is too problematic): in the future I would like to support vertical gradient within shapes.

Yeah, I've hacked some gradients but keeping the rounded rects solid, with the gradient just on the inside :)

In the case of baked corners it means the colors would need to be adjusted for the top or bottom two vertices. If we also gradient not starting at the edge of shapes it may mean that a corner quad may have to be split into two. My intuition is that we will put constraints on the gradient in order to keep the code simple and efficient (this is why I am hinting at "vertical gradient" in the first place, which is a pretty strong constraint).

Let's ignore this a bit :) Doesn't seem too hard though.

franciscod commented 6 years ago

the drawing of custom shapes are getting in the way of being able to draw clipped contents

Care to explain this a little more? I don't quite see the relation here.

2018-07-23-131551_456x98_scrot

oooohhh....

ocornut commented 6 years ago

Easy as in "add a 50% alpha right next to the circle" or "there's a nice way to do this"? Any pointers? Never did raster/software AA myself and might save some time :) I'd love if it looked nice on every backend (i.e. does the AA change between OpenGL/DX/Vulkan?) I've thought of adding some kind of "render to texture" support but thinking on all the backends was a no-no. Also ImGui should stay backend agnostic.

Yes, this should be done in imgui and be completely backend agnostic. I don't know much about rasterizing either, I believe you may look up for distance-field related shader functions, and for each pixel simply output the result of a gradient going from opaque/white to transparent/black over the distance radius-0.5 to radius+0.5 . That'll probably work well enough (just run that function over every pixel) otherwise you may perform manual sub-sampling. Perhaps they are other/better techniques, I haven't looked. We have to advantage that performance isn't important here at it is a one-time rasterization over a very small set of pixel.

It's possible that our backed rounding AA will not exactly match the one output by polygon but that's not really a problem. During debugging you may use a toggle (I tend to use keyboard modifiers directly such as io.KeyAlt) to compare both methods.

the drawing of custom shapes are getting in the way of being able to draw clipped contents Care to explain this a little more? I don't quite see the relation here. oooohhh....

That picture you posted is unrelated to the issue I'm talking about. It's a different problem, one I am happy to ignore because we are limiting rounding to small values.

What's I'm talking about is:

We have various ::RenderXXX helper functions, for example: RenderArrow() or RenderCheckMark(). Because we don't have a generic CPU-side clipper that can run on arbitrary polygons, we can't render a clipped Arrow or Checkmark without pushing a render-side clipping rectangle (which is costly).

That's a constraint in term of how we use those elements. Sometimes we may want to clip things for widgets to handle narrow space nicely. For example, imagine we want to display an Arrow inside of widget, in case the widget is resized too small we may want the Arrow to clip on a pixel by pixel basis. We currently can't do that without paying the cost of an extra draw call. edit Essentially they are never used in this manner at the moment because of this constraint, so small design/layout decisions are being made to workaround the issue.

We currently have CPU-side clipping for text only (inside ImFont::RenderText()) because it is easy and fast to implement for axis-aligned quads.

Solution 1 would be to implement a generic CPU-side clipper that can work on any kind of triangle, I don't know how easy/fast that would be especially considering we use lots of very small/thin polygons, and it may need to be called manually as we probably can't afford to call it on 100% of our vertices. Solution 2 would be to be able to easy render certain shapes in the font texture and render everything as axis-aligned quad (which would also be faster, similarly to how the rounded shapes we are dealing with in this topic will be faster to render as simple quad).

Those solutions are complementary and we may need a bit of both.

Basically I'm not saying you should solve this problem here, but that moving toward the possibility for us to add custom shapes into the atlas would be useful, and your task here is moving us a second step toward that (the first step was mouse cursors which are hard-coded bitmaps).

meshula commented 6 years ago

I'll just throw this in for consideration.

http://jcgt.org/published/0003/04/01/paper.pdf

The techniques in this paper work great for rounded corners and all kinds of things. I've used it very successfully in many projects to date. The shaders are GLES friendly, and the ones interesting for this thread consume very few cycles.

franciscod commented 6 years ago

Thanks for the pointer @meshula! I'll have a look.

--- offtopic warning --- Abusing the mention and going offtopic a bit: I've been wondering for a while how to measure CPU/GPU and cycles and other kinds of performance metrics. The symptom is "low FPS" and the question would be "what is slow"? Getting/logging the system time at various points on the frame might be misleading since some GLES calls might be batched, right? Any pointers to general techniques or even tools for profiling combined CPU/GPU performance?

franciscod commented 6 years ago

@ocornut I think I got the clipping issue now. So having a way of rasterize arbitrary polygons allow you, for example, to render a truncated arrow somewhat easily. Is that the idea?

ocornut commented 6 years ago

@ocornut I think I got the clipping issue now. So having a way of rasterize arbitrary polygons allow you, for example, to render a truncated arrow somewhat easily. Is that the idea?

Yes, they could be backed in the font, it also makes it easier to output them (this is a common technique some users already take advantage of: by loading icons fonts and merging it with their main font, see misc/fonts/README for details). Another approach, down the line (and out of scope here) is that maybe we could generate those custom shapes as TTF-like data to take advantage of the existing font rasterizers we use). Anyway, we strayed away too much from the initial target there!

Performance Metrics

It's a wide topic with many possible tooling solutions. For this specific situation, my suggestion would be to measure the CPU-side cost between NewFrame and EndFrame by creating a setup that draws an arbitrary large number of shapes (e.g. 100000 rounded rectangles and/or circle, layed out in a deterministic manner, both with and without a bunch of other imgui calls interleaved in between in order to affect the cache).

You may also make a coarse measurement of GPU cost by disabling vsync to run at maximum framerate and then measuring the whole frame-time the difference between the two versions (old and new algorithm) as well as the difference between NewFrame and EndFrame (which would be CPU-side cost, so the GPU-side difference would be roughly ~(new_dt - new_cpu_cost) - (old_dt - old_cpu_cost).

Under Windows you may use QueryPerformanceCounter(). C++11 or more modern C++ probably has something in chrono:: to take measurement.

franciscod commented 6 years ago

it's on the atlas!

2018-07-23-175223_305x137_scrot (just to the right of the mouse cursors)

https://github.com/franciscod/imgui/tree/round_quad

franciscod commented 6 years ago

woot

Code still at this dirty dev branch: https://github.com/franciscod/imgui/tree/round_quad

(I've modified only the SDL2/OpenGL3 example)

franciscod commented 6 years ago

I've made AddRectFilled use the new quads. Looks really good on higher rounding values. The vertex count went a bit up though :)

Still no AA and no stroked corners (only solid fill).

ocornut commented 6 years ago

Nice!

The vertex count went a bit up though :)

Not sure I understand. The point of this task to reduce the vertex count, why would it increase it?

A few remarks here, I realize it's not all pertinent to the core of the task, but:

franciscod commented 6 years ago

Not sure I understand. The point of this task to reduce the vertex count, why would it increase it?

I'm a bit puzzled too. Probably the funny text change is adding noise here. I spent some time drawing/triangulating polygons by hand and got this:

2018-07-25-212200_763x356_scrot

Currently, rounded rectangles use PathArcToFast which approximates roundness with 3 segments. A quad with 4 of these "fast arc" borders uses 14 triangles. The way I'm making rounded squares with quad textures also uses 14 triangles, so no triangle optimization here, just nicer edges to the eye.

On the other hand, circles with 4 quads should be 9 vertices / 8 triangles. This is way cheaper that using a reasonable num_segments for nice roundness.


Thanks for your early code feedback! Some comments regarding that:

Prefer to immediately work with the imgui formatting style (variable names, spacing, braces format).

Will do. Do you have any presets for clang-format or similar tools?

Prefer to avoid extraneous modifications in the branch (your editor trimmed space at the end of lines, it makes the branch difference very noisy)

You can add -w to git diff and ?w=1 to github diffs to make them ignore whitespace changes. https://github.com/ocornut/imgui/compare/master...franciscod:round_quad?w=1

I actually prefer the cleaner whitespace. But on the PR creation I'll rebase and remove the extraneous whitespace cleanups if you prefer :)

Make sure you don't rely on C++11 features. Your compile may have flag to enforce C++03 ?

This is news to me! Probably it's stated somewhere that I skipped reading :) I'll make sure of sticking to C++03.

In AddRoundCornerQuad() you are already checking that the texture atlas is bound as the current texture, you could call PrimQuadUV directly in each of the switch case. Avoid calling PrimReserve() up to three times for a AddRectFilled() call when it can be done once.

Yep yep yep, this was me in "just make it work" mode. The code is far from polished.

franciscod commented 6 years ago

Just pushed some optimizations regarding your last comments.

It might be useful to reuse the overlapping vertices on AddRectFilled and AddCircleFilled, either by inlining AddRoundCornerQuad or making it "buffer-aware" (harder to reuse it, maybe provide a frontend?)

franciscod commented 6 years ago

I was trying to make circles with 9 vertices (4 quads, 24 indices), reusing the adjacent quad vertices, and noticed VtxWritePtr has pos and uv (and col) coupled. So I can't reuse a vertex with different UVs...

Is the lower bound for a circle (4 adjacent quads in 2x2) 16 vertices?

(ignoring the fact that we could always raster a full circle, that's 1 quad)

ocornut commented 6 years ago

I'm a bit puzzled too. Probably the funny text change is adding noise here. I spent some time drawing/triangulating polygons by hand and got this:

You are looking at a non-anti-aliased shapes. One with anti-aliasing, which is the default, would consume many more vertices (+ CPU overhead) whereas your version won't.

Will do. Do you have any presets for clang-format or similar tools?

I don't at the moment, I'll look into it at some point. But those tools don't affect symbol naming afaik, yours don't follow the current coding convention.

I actually prefer the cleaner whitespace.

I sometimes commit with them trimmed out but over time some edits bring some back, Visual Studio doesn't trim them off by default.

It might be useful to reuse the overlapping vertices on AddRectFilled and AddCircleFilled, either by inlining AddRoundCornerQuad or making it "buffer-aware" (harder to reuse it, maybe provide a frontend?)

Basic shapes probably need the inlining and avoiding extra indirections, yes.

I think you could perhaps even share vertices for the rounded rectangle for the upper and lower horizontal shapes. May need some fiddling to get the UV right but basically instead of using the regular white pixel (TexUvWhitePixel) to fill those those rectangles they could fetch the pixels at the edge of the quarter-circle. Maybe this requires both flat edges of the quarter-circle to be extend with an extra row/line.

I was trying to make circles with 9 vertices (16 indices), reusing the adjacent quad vertices, and noticed VtxWritePtr has pos and uv (and col) coupled. So I can't reuse a vertex with different UVs...

I'm not sure I understand. Yes, the reused vertices would have same UV and Position. It should be possible to get a circle with 9 vertices. Aren't those overlapping vertices reusing the same UV and Position in the first place?

However that would create symmetrical circles which perhaps is not visually ideal for odd-sized circles.

franciscod commented 6 years ago

Basic shapes probably need the inlining and avoiding extra indirections, yes.

I just finished juggling UVs and managed the 9 vtx / 24 idx circle.

https://github.com/franciscod/imgui/commit/9a41070827dc97b3dca79feea5d3afeb169cd7b2

The rounded rectangle is next :)

to fill those those rectangles they could fetch the pixels at the edge of the quarter-circle.

Yesss just had that epiphany!

those tools don't affect symbol naming afaik, yours don't follow the current coding convention.

I'll be super careful to fix this before making the PR, I promise :)

franciscod commented 6 years ago

Another epiphany: we just need the "arc triangle" of the quarter-circle texture. So we could even pack the stroke/filled or AA-nonAA on the same texture quad! And saves 1 vtx / 6 idx on each circle :) https://github.com/franciscod/imgui/commit/555db262e3876c5015351897111d815bf89d9c4f

ocornut commented 6 years ago

Makes sense, nice!

But for strokes perhaps the triangle would cut off the edges of the stroke at the end of each quarter? May be solvable by extending the edges of the triangle further, requiring a little more empty space in the texture.

For what you are doing you probably also need to test with nearest point sampling vs bilinear sampling, make sure there’s no undesirable artifacts due to the more funky polygons.

This will be a good CPU side optimization! Mostly the cost of AA shape was due to the loop generating normals etc. If we can replace most of those shapes with lower-cpu-cost path this is great gain.

franciscod commented 6 years ago

Yep, needs more testing but I'm happy with the results so far.

I'm worried about not covering every sampling configuration on my tests, do you have any advice on this?

franciscod commented 6 years ago

(woops, hit close and comment on an unfinished message)

Tried my best to fix the code style, please let me know if something stands out and I'll fix it

franciscod commented 6 years ago

Filled rounded rectangles without wasting vtx/idx done. I'm wondering if the vertex/index savings is worth the code complexity here: https://github.com/franciscod/imgui/blob/bad1b54c89d966aa6063dd30f5ddb3e7eb186eb6/imgui_draw.cpp#L1083

franciscod commented 6 years ago

Antialias is here! You could say it saves a few vertices and indices :) (not pictured: CPU savings from the AA code)

antialias

Credits to "2.2 Filled Shape" on http://jcgt.org/published/0003/04/01/paper.pdf (thanks again @meshula!). Should we add a comment pointing to the source of the AA function before merging this?

franciscod commented 6 years ago

Stroked shapes!

hacky-stroke

But for strokes perhaps the triangle would cut off the edges of the stroke at the end of each quarter? May be solvable by extending the edges of the triangle further, requiring a little more empty space in the texture.

As you predicted there are some unconnected vertices but it's almost there :)

ocornut commented 6 years ago

If the filled circle used an additional 2 triangles to be filled exactly the same way as the rectangle, then you could store only the half quarter triangular shape in texture, and then pack both filled+stroke textures for size N into a (N+2,N+2) square, which would half the texture requirement.

However probably best to first focus on the hole and exact radius, and circle of different sizes (maybe display several sizes in the test, including odd/even sizes).

Once we have this in place for the feedback larger rounded rectangle we could switch to using more polygons than the 3 steps thing we use currently, so there’s no visual degradation for very large shapes.

ocornut commented 6 years ago

I added some comments to some of your commits.

Overall this is going to be super useful, big thanks for working on that!

Maybe next step is to make a perf test (render 10k/100k shapes and measure/compare perfs) so at least you can measure the speed change, vs AA and non-AA paths and then measure further optimizations if any. It would be nice if the new code is at least not slower than the non-AA path, and it definitively should be faster than the AA path.

Should we add a comment pointing to the source of the AA function before merging this?

It doesn't hurt adding references, but this is looks rather standard code - would commonly find it in e.g. 2d distance functions based shaders. Also the shapes looks currently a little bigger and different from the polygon-based technique, so we ought to narrow that gaps and that may mean tweaking the code further.

See also: https://www.shadertoy.com/view/4dfXDn

float circleDist(vec2 p, float radius)
{
    return length(p) - radius;
}

float fillMask(float dist)
{
    return clamp(-dist, 0.0, 1.0);
}

float innerBorderMask(float dist, float width)
{
    //dist += 1.0;
    float alpha1 = clamp(dist + width, 0.0, 1.0);
    float alpha2 = clamp(dist, 0.0, 1.0);
    return alpha1 - alpha2;
}
franciscod commented 6 years ago

Thanks for the guidance @ocornut.

I agree on "looks good" wins over "HAZ SICK TRIANGLE COUNT!!!" everytime, and weird code is hard to read/maintain too.

I'll probably switch to the simpler antialias / border calculation (the other one looks fancier but has some magic I don't understand fully).

ocornut commented 6 years ago

I agree on "looks good" wins over "HAZ SICK TRIANGLE COUNT!!!" everytime, and weird code is hard to read/maintain too.

It didn't actually say that :) Those functions are critical to imgui performances, so if we can make them more optimal at the cost of using more tricky code it is often worth it.

The large advantage of moving to texture-based rounding is that we (should) largely reduce the CPU-side cost of generating vertices. The number of vertices doesn't matter so much (they won't affect the GPU workload very much), it is more the reduction of CPU overhead that we are aiming at.

In the case of my suggestion for adding inside-filling triangles for the circle, the idea was more to save on overall texture size (we can half the texture cost), and it won't affect CPU cost much. It's not super important but it also means the central area of the circle can fetch the same few pixels which is a little more GPU friendly.

franciscod commented 6 years ago

new-aa

New AA, triangle textures and yet another polygon layout (with no gaps!). Corners use 2 triangles (reflecting the texture layout). The circle is done (the center vertex was needed) but I'm having trouble with the rounded rect, particularly the stroked one. Surely after enough refactors I'll be happy with it :)

tricky

ocornut commented 6 years ago

Some feedback, as I gave a quick go to this today.

(0) Attached is a commit to bring match the coding-style with imgui's. patch.zip

Also tweaked the test code and moved it into its own function so it's easier to move in different main.cpp files. I copied the test code below in this code.

I added a checkbox to display 5000 of each shapes, so 20000 shapes in total. (This needs #define ImDrawIdx unsigned int to be set in imconfig.h to work).

I did some super basic measurements by modifying the FramerateSecPerFrame[120] buffer to FramerateSecPerFrame[30] for a faster converging average over only 30 slow frames and noting frame times down.

To differentiate CPU from GPU cost I quickly hacked the ImDrawList rendering code to optionally skip draw commands with more than 10000 vertices, so I could see the rest of he UI, but the call wouldn't be submitted to the GPU (nb: this ignore the extra cost inherent to uploading the buffer to GPU memory).


(1) CPU wise the new code is much faster (I disabled rendering to compare CPU cost only), probably 5+ times faster which is already very welcome. (Also haven't investigated very much, but I suspect CPU-wise the second call to PrimReserve() should probably be removed). Yeah!

(2) With 20000 shapes (5000 filled circles, 5000 stroked circles, 5000 rounded rect, 5000 stroked rect). Before: ~1.2 millions triangles / After: ~0.25 millions triangles. Yeah!

(3) GPU wise, things are more tricky for Intel HD graphics...

edit The frame times below are total app frame time including CPU cost, GPU buffer upload etc, but i’ve determInated above that the non-GPU cost are decently small.

Frame time on Intel HD Graphics 530, Windows 10

Type Before After
UI + All shapes: ~55 ms, ~62.0 ms
UI + Circle Filled (Rad 30): ~12 ms, ~4.2 ms
UI + Circle Filled (Rad 60): ~16 ms, ~14 ms
UI + Circle Stroke (Rad 30): ~13.8 ms, ~4.2 ms
UI + Circle Stroke (Rad 60): ~13.8 ms, ~14.7 ms
UI + Rect Filled (Dim 200) ~40.8 ms, ~38 ms
UI + Rect Stroke (Dim 200) ~7 ms ~22 ms

Frame time on Nvidia 1080 gtx, Windows 10

Type Before After
UI + All shapes: ~33.7 ms, ~8.274 ms
UI + Circle Filled (Rad 30): ~12 ms ~1.9 ms
UI + Circle Filled (Rad 60): ~12 ms, ~2.2 ms
UI + Circle Stroke (Rad 30): ~13.5 ms ~1.9 ms
UI + Circle Stroke (Rad 60): ~13.5 ms ~2.1 ms
UI + Rect Filled (Dim 200) ~6 ms ~4.9 ms
UI + Rect Stroke (Dim 200) ~6.6 ms ~3.2 ms

So, big win on Nvidia, occasional loss on Intel HD. Note the measurement are quite imprecise, and somehow the total don't add up linearly on the Intel side.

Basically here what I think you could try:

A) Most importantly: For stroked shapes, reduce the amount of invisible pixel surface covered, this should be the big win.

B) For the "middle" section of filled shapes, we could optional try to interpolate between the same texel to create the filling, so the GPU doesn't have to fetch more than 1 texel. Perhaps this will help the GPU a little and perhaps (just pulling a dumb guess here!) some texture filtering units would have a fast path when multiple ends are the same? (doubtful as I suspect it's not a common thing in real games). Just pulling this idea out there, but A) will be much more valuable and probably cover the regression case on Intel HD.

If you can improve (A) then the new version will be amazingly better on every front :)

Thanks a lot for this!


Test code

static void GetVtxIdxDelta(ImDrawList* draw_list, int* vtx, int *idx)
{
    static int vtx_n, idx_n;
    static int vtx_o, idx_o;
    vtx_n = draw_list->VtxBuffer.Size;
    idx_n = draw_list->IdxBuffer.Size;

    *vtx = vtx_n - vtx_o;
    *idx = idx_n - idx_o;

    vtx_o = vtx_n;
    idx_o = idx_n;
}

static void TestRoundShapes()
{
    ImGuiIO& io = ImGui::GetIO();

    if (!ImGui::Begin("Round Shapes"))
    {
        ImGui::End();
        return;
    }

    ImGui::TextUnformatted("Press Shift to toggle quads (hold to see them).");
    ImGui::TextUnformatted(io.KeyShift? "SHIFT ON  -- Rasterized quad circle! w00t! OPTIMIZATION!"
        : "SHIFT OFF -- Regular, boring circle with PathArcToFast.");

    static bool stress_test[4] = { false, false, false, false };
    if (sizeof(ImDrawIdx) > 2)
    {
        if (ImGui::Checkbox("Stress Test All", &stress_test[0])) { stress_test[1] = stress_test[2] = stress_test[3] = stress_test[0]; }
        ImGui::Checkbox("Stress Test Circle Filled", &stress_test[0]);
        ImGui::Checkbox("Stress Test Circle Stroke", &stress_test[1]);
        ImGui::Checkbox("Stress Test Rect Filled",   &stress_test[2]);
        ImGui::Checkbox("Stress Test Rect Stroked",  &stress_test[3]);
    }
    const int render_count[4] = { stress_test[0] ? 5000 : 1, stress_test[1] ? 5000 : 1, stress_test[2] ? 5000 : 1, stress_test[3] ? 5000 : 1 };

    static float r = io.Fonts->RoundCornersMaxSize * 0.5f;
    ImGui::SliderFloat("radius", &r, 0, (float)io.Fonts->RoundCornersMaxSize, "%.0f");

    ImGui::BeginGroup();

    static int segments = 20;
    ImGui::PushItemWidth(120);
    ImGui::SliderInt("segments", &segments, 3, 64);
    ImGui::PopItemWidth();

    int vtx = 0, idx = 0;
    ImDrawList* draw_list = ImGui::GetWindowDrawList();

    {
        ImGui::Button("", ImVec2(200, 200));
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImVec2 min = ImGui::GetItemRectMin();
        ImVec2 size = ImGui::GetItemRectSize();
        for (int n = 0; n < render_count[0]; n++)
            draw_list->AddCircleFilled(ImVec2(min.x + size.x / 2.0f, min.y + size.y / 2.0f), r, IM_COL32(255,255,0,255), segments);
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImGui::Text("AddCircleFilled\n %d vtx, %d idx", vtx, idx);
    }
    {
        ImGui::Button("", ImVec2(200, 200));
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImVec2 min = ImGui::GetItemRectMin();
        ImVec2 size = ImGui::GetItemRectSize();
        for (int n = 0; n < render_count[1]; n++)
            draw_list->AddCircle(ImVec2(min.x + size.x / 2.0f, min.y + size.y / 2.0f), r, IM_COL32(255,255,0,255), segments);
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImGui::Text("AddCircle\n %d vtx, %d idx", vtx, idx);
    }
    ImGui::EndGroup();

    ImGui::SameLine();

    static bool tl = true, tr = true, bl = true, br = true;
    ImGui::BeginGroup();
    ImGui::Checkbox("TL", &tl);
    ImGui::SameLine(0, 12);
    ImGui::Checkbox("TR", &tr);
    ImGui::SameLine(0, 12);
    ImGui::Checkbox("BL", &bl);
    ImGui::SameLine(0, 12);
    ImGui::Checkbox("BR", &br);

    ImDrawCornerFlags flags = 0;
    flags |= tl ? ImDrawCornerFlags_TopLeft : 0;
    flags |= tr ? ImDrawCornerFlags_TopRight : 0;
    flags |= bl ? ImDrawCornerFlags_BotLeft : 0;
    flags |= br ? ImDrawCornerFlags_BotRight : 0;

    {
        ImGui::Button("", ImVec2(200, 200));
        GetVtxIdxDelta(draw_list, &vtx, &idx);

        ImVec2 r_min = ImGui::GetItemRectMin();
        ImVec2 r_max = ImGui::GetItemRectMax();
        for (int n = 0; n < render_count[2]; n++)
            draw_list->AddRectFilled(r_min, r_max, IM_COL32(255,255,0,255), r, flags);
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImGui::Text("AddRectFilled\n %d vtx, %d idx", vtx, idx);
    }
    {
        ImGui::Button("", ImVec2(200, 200));
        GetVtxIdxDelta(draw_list, &vtx, &idx);

        ImVec2 r_min = ImGui::GetItemRectMin();
        ImVec2 r_max = ImGui::GetItemRectMax();
        for (int n = 0; n < render_count[3]; n++)
            draw_list->AddRect(r_min, r_max, IM_COL32(255,255,0,255), r, flags);
        GetVtxIdxDelta(draw_list, &vtx, &idx);
        ImGui::Text("AddRect\n %d vtx, %d idx", vtx, idx);
    }
    ImGui::EndGroup();

    ImGui::Separator();

    ImFontAtlas* atlas = ImGui::GetIO().Fonts;
    ImGui::Image(atlas->TexID, ImVec2((float)atlas->TexWidth, (float)atlas->TexHeight), ImVec2(0, 0), ImVec2(1, 1), ImColor(255,255,255,255), ImColor(255,255,255,128));

    ImGui::End();
}
franciscod commented 6 years ago

Attached is a commit to bring match the coding-style with imgui's.

Just pushed it. In the future you can create a PR on franciscod/imgui targetting round_quad if that's easier for you :)

CPU wise the new code is much faster

:dancer:

Before: ~1.2 millions triangles / After: ~0.25 millions triangles.

:dancer: :dancer:

GPU wise, things are more tricky for Intel HD graphics...

Is this with the DirectX11 renderer? Do you think testing this in many OS/renderers is worth the time? We could even generalize some of the (hypothetic) benchmarking code, and make a step towards #435.

(while working on this I've only used OpenGL on Linux but I have access to Windows/macOS/iOS/Android machines too)

A) Most importantly: For stroked shapes, reduce the amount of invisible pixel surface covered, this should be the big win.

This will be the next thing I'll try as soon as I have some time.

Thanks again for your feedback Omar! I'm glad this helps!

ocornut commented 6 years ago

Is this with the DirectX11 renderer? Do you think testing this in many OS/renderers is worth the time?

Sorry I forgot to answer to this, yes it is with the DirectX11. I'll run other tests but I suspect the result will be similar, the result seem to indicate that on those low-end cards we are not bottleneck by e.g. draw call overhead but rather by fillrate and memory bandwidth, which A) will improve all across the board.

We could even generalize some of the (hypothetic) benchmarking code, and make a step towards #435.

Totally. One of my task at the moment is to make an early prototype of an automation API, even though it is a long way out it will be easy to leverage it to create consistent, repeatable test cases for performances measurements.

I have marked this as a potential 1.63 feature, I think it'd be a great inclusion (with the above changes + making the radius matches and other tweaks).

franciscod commented 6 years ago

I haven't had the time for working some more on this. Seems like 1.63 (and 1.64!) went ahead so this will be on 1.65+ I guess :)

ocornut commented 5 years ago

@franciscod I have updated and rebased your branch over latest (spent some time on it, dozens of conflicts over the 50 commits) into a branch called features/tex_round_corners in case you decide to resume this work and for myself as a reference.

Among things that are unsupported: thick strokes. May want to cache in support for a finite numbers of thickness, probably 1.0 and 2.0.

Also, that's a minor thing but to consider for later: the ability to draw arbitrary n-gons (e.g. an hexagon) may be to be allowed via a different call than AddCircle.

franciscod commented 5 years ago

Thanks for the rebase + roundup! I'll bite: what's on your mind with n-gons?

ocornut commented 5 years ago

what's on your mind with n-gons?

The user should be able to render n-gons, they don't have to be texture-optimized. Previously it was possible using AddCircle() with a low polygon count. We'll probably need to add an extra ImDrawList function to do it now.

Note that this is super minor... there are much more important stuff to fix and finish with this feature, like what we discussed above.

ocornut commented 5 years ago

Rebased on master with some tweaks, and moved test code to imgui_demo so it shows and work with all backends and examples.

thehans commented 4 years ago

Hi, I'm new to imgui, and so far just been playing around a bit with the master branch but this looks like some really nice changes.

I hope its not considered out of scope or off-topic, but I've been looking into making an angled / chamfered looking theme.

Screenshot from 2019-10-22 15-58-29 I did this with a quick hack in the loop of PathArcToFast changing the increment a++ to a+=3

I noticed your screenshots have a "segments" adjustment, so would setting this to 4 create the same effect?

One other small thing I haven't figured out is if its possible to make the frames and grabs come to a perfect point on the sides. It seems there's always a bit of a flat spot no matter how large I set the "rounding" (chamfer).

ocornut commented 4 years ago

Hello @thehans, this is kinda off-topic with this specific thread/PR and should be asked on a separate thread.

Unfortunately the short answer is that I cannot think of a good reason to support this in the master repository, it is likely to cause many subtles complications and defeat the purpose of this and other changes designed to allow widespread texture-based rounding.

If you do want this theme in your app you'll have to be maintaining a patch on your side.

martind0 commented 4 years ago

Any updates on this?

franciscod commented 4 years ago

Nope! Probably would be a good idea to pick up from Omar's last rebase: https://github.com/ocornut/imgui/commit/f54f78ea262cd383b2fe296cb9b0095cd96011c9

ocornut commented 4 years ago

Lots of work has been done on this, sorry it hasn’t been pushed to the public branch, will do it soon.

ocornut commented 4 years ago

Changes have been pushed to https://github.com/ocornut/imgui/commits/features/tex_round_corners Most of the remaining work has been done by @ShironekoBen

Compared to original version:

franciscod commented 4 years ago

Maybe this FIXME isn't relevant anymore: https://github.com/ocornut/imgui/compare/features/tex_round_corners#diff-9273117b625021c0e379311d92a3d30aR546

We probably should move the test code + flag checkbox somewhere else other than the top of demo window (https://github.com/ocornut/imgui/compare/features/tex_round_corners#diff-fb4bd618fdb78483fa52e52b2ff5abf4R504)

Is there anything big that should be done before we think about merging all this?

Partial list:

ocornut commented 4 years ago

Maybe this FIXME isn't relevant anymore

Right, will remove.

We probably should move the test code

Yes. None of it makes much sense to keep as-is in the demo, there's already various "custom rendering" demos in there. The debugging test bed will be moved to imgui_dev/ repo.

Is there anything big that should be done before we think about merging all this?

I'm not sure anymore (apart from the things you mentioned), i'm waking this up and we'll see.

N-gons could be in the atlas too with text)

I don't think it's worth it. Rounded rectangle are largely contributing to 90%+ of most scenes. Regular square are already quite optimal with 2 triangles.

(maybe only simple ones like triangles or squares to use inline

That's another thing we should work on later, facilitate upload of custom shapes and mapping to font, but we can leave to another issue as it is itself quite a wide issues (if you consider multiple font, support for multi-dpi).

stroke_width > 1 needs more work (apparently not using the textures even with the flag on)

Seems to work here but last pushed version had a bug where it baked 1, 3 and 4 instead of 1,2 and 4 (pushed now), so 2 would use the polygon path. Pushing change now.

resize grips need improvement

Yes something broke along the way, notified Ben about it yesterday.

Thanks for your feedback!

franciscod commented 4 years ago

Ok good! I was delighted with seeing most of it up and running after all this time. Ben sure did great work here! I hope I can help in the final stretch before we merge :)