Closed rtrussell closed 2 years ago
I wonder if the best play is to change line rendering to use primitives (a quad per line via 4 vertices plus an index buffer to turn them into two triangles). It should avoid subtle line rendering differences across APIs/drivers and would give SDL's code more control - it's what LÖVE does and it seems to have worked decently there, at least.
I wonder if the best play is to change line rendering to use primitives
I believe Ryan has changed the line rendering in SDL 2.0.18 in an attempt to make it more consistent across backends, but unfortunately it seems to have had the opposite effect with OpenGL on Windows.
Yeah. The current implementation is still using the graphics API's lines (which are themselves not too consistent) so reimplementing lines using primitives / geometry would avoid that inconsistency at least.
Polylines always seem to draw correctly (probably because they are commonly used to draw outline polygons) so I suggested to Ryan some time ago that a reliable way to draw a line inclusive of both endpoints ought to be as a polyline with two segments (one being just a single pixel). I don't know if he ever tried that.
Honestly, I'm willing to try just about anything that will result in me not being assigned this bug again after every new release. :/
Here's a version to draw lines based on RenderGeometry as suggested @slime73 similar to MiterJoinPolyline, with no SMOOTHING, and line with thickness 1.0f ... https://github.com/love2d/love/blob/main/src/modules/graphics/Polyline.cpp#L173
It adds other differences but it seems to solve some end points (tested software vs OpenGl on linux )...
int
SDL_RenderDrawLinesF(SDL_Renderer * renderer,
const SDL_FPoint * points, int count)
{
int retval;
CHECK_RENDERER_MAGIC(renderer, -1);
if (!points) {
return SDL_SetError("SDL_RenderDrawLinesF(): Passed NULL points");
}
if (count < 2) {
return 0;
}
#if DONT_DRAW_WHILE_HIDDEN
/* Don't draw while we're hidden */
if (renderer->hidden) {
return 0;
}
#endif
if (renderer->scale.x != 1.0f || renderer->scale.y != 1.0f) {
return RenderDrawLinesWithRectsF(renderer, points, count);
}
if (!(renderer->info.flags & SDL_RENDERER_SOFTWARE) && renderer->QueueGeometry) {
SDL_bool isstack1;
SDL_bool isstack2;
float *xy = SDL_small_alloc(float, 2 * 2 * count, &isstack1);
int *indices = SDL_small_alloc(int, 6 * (count - 1), &isstack2);
if (xy && indices) {
int i;
float *ptr_xy = xy;
int *ptr_indices = indices;
const int xy_stride = 2 * sizeof (float);
const int num_vertices = 2 * count;
const int num_indices = 6 * (count - 1);
const int size_indices = 4;
int cur_indice = 0;
const int is_looping = (points[0].x == points[count - 1].x && points[0].y == points[count - 1].y);
SDL_FPoint before, after, p;
if (is_looping) {
before = points[count - 2];
after = points[1];
} else {
/* mirrored of point 1, by point 0 */
before.x = 2.0f * points[0].x - points[1].x;
before.y = 2.0f * points[0].y - points[1].y;
/* mirrored of before last point 1, by last pont */
after.x = 2.0f * points[count - 1].x - points[count - 2].x;
after.y = 2.0f * points[count - 1].y - points[count - 2].y;
}
p = before;
for (i = 0; i < count; ++i) {
SDL_FPoint q, r;
float norm_qp;
float norm_qr;
float dot, coef;
float offset;
const float half_width = 0.5f;
SDL_FPoint u_qp, u_qr;
SDL_FPoint n_qp; /* normal */
q = points[i];
if (i == count - 1) {
r = after;
} else {
r = points[i + 1];
}
norm_qp = SDL_sqrtf((p.x - q.x) * (p.x - q.x) + (p.y - q.y) * (p.y - q.y));
norm_qr = SDL_sqrtf((p.x - r.x) * (p.x - r.x) + (p.y - r.y) * (p.y - r.y));
u_qp.x = (p.x - q.x) / norm_qp;
u_qp.y = (p.y - q.y) / norm_qp;
u_qr.x = (r.x - q.x) / norm_qr;
u_qr.y = (r.y - q.y) / norm_qr;
n_qp.x = - u_qp.y;
n_qp.y = u_qp.x;
dot = u_qr.x * n_qp.x + u_qr.y * n_qp.y;
/* translate by q + (half_width, half_width) */
offset = half_width;
if (SDL_fabs(dot) < 0.05f) { /* parallel lines */
coef = half_width;
*ptr_xy++ = q.x + offset - coef * n_qp.x;
*ptr_xy++ = q.y + offset - coef * n_qp.y;
*ptr_xy++ = q.x + offset + coef * n_qp.x;
*ptr_xy++ = q.y + offset + coef * n_qp.y;
} else {
coef = half_width / dot;
*ptr_xy++ = q.x + offset - coef * (u_qr.x + u_qp.x);
*ptr_xy++ = q.y + offset - coef * (u_qr.y + u_qp.y);
*ptr_xy++ = q.x + offset + coef * (u_qr.x + u_qp.x);
*ptr_xy++ = q.y + offset + coef * (u_qr.y + u_qp.y);
}
p = q;
}
for (i = 0; i < count - 1; ++i) {
*ptr_indices++ = cur_indice + 0;
*ptr_indices++ = cur_indice + 1;
*ptr_indices++ = cur_indice + 2;
*ptr_indices++ = cur_indice + 3;
*ptr_indices++ = cur_indice + 1;
*ptr_indices++ = cur_indice + 2;
cur_indice += 2;
}
if (0)
SDL_RenderDrawPointsF(renderer, xy, num_vertices);
retval = QueueCmdGeometry(renderer, NULL,
xy, xy_stride, (int *) &renderer->color, 0 /* color_stride */, NULL, 0,
num_vertices, indices, num_indices, size_indices,
1.0f, 1.0f);
SDL_small_free(xy, isstack1);
SDL_small_free(indices, isstack2);
}
} else {
retval = QueueCmdDrawLines(renderer, points, count);
}
return retval < 0 ? retval : FlushRenderCommandsIfNotBatching(renderer);
}
here's a testcase main_diff_drawline.c.txt
open 3 windows. 1/ render a scene with lines, with Software renderer 2/ render a scene with lines, with OpenGL renderer (but you can change the code to use another one) 3/ show the diff 1 vs 2, with colors
https://github.com/love2d/love/blob/main/src/modules/graphics/Polyline.cpp#L141 not much comment in code yet, but like the love polyline + drawing.
There are three points: p q r. q is the current line point. p is the previous point, and r is the point right after (with some virtual point at start/end, if the polyline is closed or not, eg is_looping). We replace the polyline with a triangle strip, where q is replaced by two points that are 'halfwidth' far from q, on (p,q,r) bissector. And we draw triangles based on those points. ( if p,q,r are aligned, this is a special/corner case, but simpler for the computation)
TODO: we're not handling two successive points that would be duplicates ..
Here's a version to draw lines based on RenderGeometry
I wonder if it would benefit from special-casing horizontal and vertical lines, which are likely to be used as building blocks for more complex shapes (filled polygons, circles, ellipses etc. as in SDL2_gfx).
@rtrussell I don't know if we need we special-casing horizontal and vertical lines ... But they can be in polyline, or at end or stop, so this ... and it is already covered. Maybe you can try to recompile SDL with this function to see how it goes for you !
Maybe you can try to recompile SDL
I'm not set up to compile SDL from source on Windows, I don't have Visual Studio or anything similar installed. If you can make me a DLL I can try it.
@rtrussell I cannot build a DLL neither. but here's another option:
Use this function for testing that can be compiled done inside your code. it's similar to SDL_RenderDrawLinesF ( you just need to use the default scale 1.0 1.0 and not the software renderer)
int
SDL_RenderDrawLinesF__test(SDL_Renderer * renderer,
const SDL_FPoint * points, int count)
{
int retval;
float scalex, scaley;
if (!points) {
return SDL_SetError("SDL_RenderDrawLinesF(): Passed NULL points");
}
if (count < 2) {
return 0;
}
SDL_RenderGetScale(renderer, &scalex, &scaley);
if (scalex != 1.0f || scaley != 1.0f) {
// TODO
return -1;
}
if (1) {
float *xy = SDL_malloc(sizeof(float) * 2 * 2 * count);
int *indices = SDL_malloc(sizeof(int) * 6 * (count - 1));
if (xy && indices) {
int i;
float *ptr_xy = xy;
int *ptr_indices = indices;
const int xy_stride = 2 * sizeof (float);
const int num_vertices = 2 * count;
const int num_indices = 6 * (count - 1);
const int size_indices = 4;
int cur_indice = 0;
const int is_looping = (points[0].x == points[count - 1].x && points[0].y == points[count - 1].y);
SDL_FPoint before, after, p;
if (is_looping) {
before = points[count - 2];
after = points[1];
} else {
/* mirrored of point 1, by point 0 */
before.x = 2.0f * points[0].x - points[1].x;
before.y = 2.0f * points[0].y - points[1].y;
/* mirrored of before last point 1, by last pont */
after.x = 2.0f * points[count - 1].x - points[count - 2].x;
after.y = 2.0f * points[count - 1].y - points[count - 2].y;
}
p = before;
for (i = 0; i < count; ++i) {
SDL_FPoint q, r;
float norm_qp;
float norm_qr;
float dot, coef;
float offset;
const float half_width = 0.5f;
SDL_FPoint u_qp, u_qr;
SDL_FPoint n_qp; /* normal */
q = points[i];
if (i == count - 1) {
r = after;
} else {
r = points[i + 1];
}
norm_qp = SDL_sqrtf((p.x - q.x) * (p.x - q.x) + (p.y - q.y) * (p.y - q.y));
norm_qr = SDL_sqrtf((p.x - r.x) * (p.x - r.x) + (p.y - r.y) * (p.y - r.y));
u_qp.x = (p.x - q.x) / norm_qp;
u_qp.y = (p.y - q.y) / norm_qp;
u_qr.x = (r.x - q.x) / norm_qr;
u_qr.y = (r.y - q.y) / norm_qr;
n_qp.x = - u_qp.y;
n_qp.y = u_qp.x;
dot = u_qr.x * n_qp.x + u_qr.y * n_qp.y;
/* translate by q + (half_width, half_width) */
offset = half_width;
if (SDL_fabs(dot) < 0.05f) { /* parallel lines */
coef = half_width;
*ptr_xy++ = q.x + offset - coef * n_qp.x;
*ptr_xy++ = q.y + offset - coef * n_qp.y;
*ptr_xy++ = q.x + offset + coef * n_qp.x;
*ptr_xy++ = q.y + offset + coef * n_qp.y;
} else {
coef = half_width / dot;
*ptr_xy++ = q.x + offset - coef * (u_qr.x + u_qp.x);
*ptr_xy++ = q.y + offset - coef * (u_qr.y + u_qp.y);
*ptr_xy++ = q.x + offset + coef * (u_qr.x + u_qp.x);
*ptr_xy++ = q.y + offset + coef * (u_qr.y + u_qp.y);
}
p = q;
}
for (i = 0; i < count - 1; ++i) {
*ptr_indices++ = cur_indice + 0;
*ptr_indices++ = cur_indice + 1;
*ptr_indices++ = cur_indice + 2;
*ptr_indices++ = cur_indice + 3;
*ptr_indices++ = cur_indice + 1;
*ptr_indices++ = cur_indice + 2;
cur_indice += 2;
}
if (0)
SDL_RenderDrawPointsF(renderer, xy, num_vertices);
SDL_RenderSetScale(renderer, 1.0, 1.0);
SDL_Color color;
SDL_GetRenderDrawColor(renderer, &color.r, &color.g, &color.b, &color.a);
retval = SDL_RenderGeometryRaw(renderer, NULL,
xy, xy_stride, (int *) &color, 0 /* color_stride */, NULL, 0,
num_vertices, indices, num_indices, size_indices);
SDL_free(xy);
SDL_free(indices);
}
} else {
// retval = QueueCmdDrawLines(renderer, points, count);
}
return retval;
}
Here's a version to draw lines based on RenderGeometry as suggested @slime73 similar to MiterJoinPolyline, with no SMOOTHING, and line with thickness 1.0f ... https://github.com/love2d/love/blob/main/src/modules/graphics/Polyline.cpp#L173
For what it's worth LÖVE's miter join lines have a couple known issues (for example there's no fallback to bevel or other join when the angle is really small, which makes the line geometry extend way past where you'd expect in those situations, and lines with duplicate points sometimes don't get rendered), so copying that implementation wholesale might add issues you don't want.
Use this function for testing that can be compiled done inside your code.
It's principally SDL2_gfx, not my code, that would need to be altered to carry out the test. I suppose I could edit SDL2_gfxPrimitives.c
to call your routine, but it's not a trivial exercise.
@rtrussell maybe you do a search and replace, or if you're lucky a preprocessor define
Same problem on Raspberry Pi with kmsdrm backend and OpenGL ES: Lines are always one pixel too short.
@rtrussell maybe you do a search and replace
It's not that easy because SDL2_gfx needs SDL_RenderDrawLine()
and SDL_RenderDrawLines()
(not the float versions). Also, I get "_undefined reference to 'SDLRenderGeometryRaw'" because my header and library files are not up to date (deliberately, because I'm building for multiple platforms, some of which don't have anything later than SDL 2.0.10).
To make it easy to test your code I would need somebody to build me a DLL.
@slime73 I didn't copy paste LOVE code, just use the same method. it has very likely the corner with duplicate points. indeed, maybe a fallback to bevel for small angle seems a good idea !
@Kartoffelbauer maybe you can recompile SDL2 from source and use the previous function for testing
@rtrussell let's create an issue so that someone document the DLL creating on window !
Here's a version to draw lines based on RenderGeometry as suggested @slime73
I have a major worry that your approach based on RenderGeometry will completely break my app, because it is fundamentally reliant on the line being drawn responding to glLogicOp(). I'm pretty sure that glLogicOp() works only with drawing primitives, not triangles. If that is the case, switching to your approach would be a disaster for me.
If you're reliant on a library's internal implementation-specific state for your own code, maybe it'll be better in the long run for you to move to something you have more control over? Such as direct OpenGL without SDL_Render, or an OpenGL wrapper library that you can build and modify in your own source code.
In the case of glLogicOp it's been outdated for so long (almost two decades) that I don't know how it works, but triangles are primitives as far as OpenGL is concerned so you probably don't need to worry about that specific function.
If you're reliant on a library's internal implementation-specific state for your own code, maybe it'll be better in the long run for you to move to something you have more control over?
Mixing SDL2 functions and direct calls to OpenGL is supported - it's the sole reason for the SDL_RenderFlush()
function existing - so although there is no explicit list of which OpenGL functions may be safely called and which may not, as far as I am concerned it is entirely unreasonable to change SDL2 in such a way that what worked before no longer works.
In any case there is no acceptable workaround. Support for 'logical' plotting (AND, OR, XOR) is essential for my app because it's a programming language (which has been around for more than 40 years) which provides that functionality to the programmer.
In the case of glLogicOp it's been outdated for so long
I'm not sure what you mean by "outdated". It's rarely used in modern applications, admittedly, but It's documented and fully supported in OpenGL and in OpenGLES v1.
Ultimately the decision is down to Ryan and Sam, but changing SDL_RenderDrawLine()
so fundamentally that glLogicOp()
no longer affects it, when it has done from SDL 2.0.0 to 2.0.18, would break my app in a catastrophic way and I might well have no alternative but to abandon it.
We need to fix line drawing as it currently seems impossible to get it pixel perfect on every graphics implementation. However we should leave the original code in there enabled via a hint, so projects like @rtrussell's will continue working.
Mixing SDL2 functions and direct calls to OpenGL is supported - it's the sole reason for the SDL_RenderFlush() function existing
I'm trying to help you by suggesting long term solutions that avoid needing to worry about any SDL change. SDL's render internals have changed significantly over the years, will change in the future, and different platforms exist and will grow in popularity and use which don't use a specific backend you're using.
In any case there is no acceptable workaround.
but changing SDL_RenderDrawLine() so fundamentally that glLogicOp() no longer affects it
There is nothing I'm aware of to suggest that glLogicOp does not work with triangle primitives. So far as I can tell, triangle-based lines will continue to work with glLogicOp (when legacy OpenGL is used on a platform). Does the GL spec language say otherwise? I'd be very surprised. Logic operations happen after primitives are rasterized so they shouldn't care about what the primitives are.
we should leave the original code in there enabled via a hint,
I don't believe this is needed for this specific case. I'd definitely test any new implementation in a variety of situations before declaring it ready to replace the old stuff, though.
I'm not sure what you mean by "outdated". It's rarely used in modern applications, admittedly, but It's documented and fully supported in OpenGL and in OpenGLES v1.
It doesn't really exist on many modern GPUs as far as I know, and therefore doesn't exist in some newer graphics APIs. Including OpenGL ES post-1.0. It's rarely used these days because shaders have supplanted most use cases, I think.
Here are precompiled DLLs with this change for testing: http://www.libsdl.org/tmp/x86/SDL2.dll http://www.libsdl.org/tmp/x64/SDL2.dll
Here are precompiled DLLs with this change for testing:
As far as I can see it's made it worse. I've confirmed that the DLL is reporting 2.0.19, but if I run the testcase that I listed at the start of the thread I'm still seeing the black line, and I'm also seeing it with the OpenGL hint commented out, which I wasn't before. If I've not done something silly, it appears to have made it consistent, but wrong!
It doesn't really exist on many modern GPUs as far as I know
I don't know how it's working under the hood, but since it's still documented in the OpenGL 4 docs, and there's no mention of it being deprecated, I can't see that there is a problem with it being used.
I do note that it says "specify a logical pixel operation for rendering" (not just drawing) so it may be that my concern about it not affecting triangles was misplaced.
Here are precompiled DLLs with this change for testing:
As far as I can see it's made it worse. I've confirmed that the DLL is reporting 2.0.19, but if I run the testcase that I listed at the start of the thread I'm still seeing the black line, and I'm also seeing it with the OpenGL hint commented out, which I wasn't before. If I've not done something silly, it appears to have made it consistent, but wrong!
I can reproduce this as well.
With the LOVE method I get lots of artifacts. I believe it's ok with thick line, but with a 1 pixel width, it's lot of diff. So I try another thing:
It was quite fast to implement and it gives much better result for me
It would be nice to test:
int
SDL_RenderDrawLinesF(SDL_Renderer * renderer,
const SDL_FPoint * points, int count)
{
int retval;
CHECK_RENDERER_MAGIC(renderer, -1);
if (!points) {
return SDL_SetError("SDL_RenderDrawLinesF(): Passed NULL points");
}
if (count < 2) {
return 0;
}
#if DONT_DRAW_WHILE_HIDDEN
/* Don't draw while we're hidden */
if (renderer->hidden) {
return 0;
}
#endif
if (renderer->scale.x != 1.0f || renderer->scale.y != 1.0f) {
return RenderDrawLinesWithRectsF(renderer, points, count);
}
if (!(renderer->info.flags & SDL_RENDERER_SOFTWARE) && renderer->QueueGeometry) {
SDL_bool isstack1;
SDL_bool isstack2;
/* +1 worst case for bevel */
float *xy = SDL_small_alloc(float, 4 * 2 * count, &isstack1);
int *indices = SDL_small_alloc(int,
(4) * 3 * (count - 1)
+ (2) * 3 * (count)
, &isstack2);
if (xy && indices) {
int i;
float *ptr_xy = xy;
int *ptr_indices = indices;
const int xy_stride = 2 * sizeof (float);
int num_vertices = 0;
int num_indices = 0;
const int size_indices = 4;
int cur_indice = -4;
const int is_looping = (points[0].x == points[count - 1].x && points[0].y == points[count - 1].y);
SDL_FPoint p;
for (i = 0; i < count; ++i) {
SDL_FPoint q;
q = points[i];
q.x += 0.5f;
q.y += 0.5f;
*ptr_xy++ = SDL_floorf(q.x);
*ptr_xy++ = SDL_floorf(q.y);
*ptr_xy++ = SDL_floorf(q.x) + 1.0f;
*ptr_xy++ = SDL_floorf(q.y);
*ptr_xy++ = SDL_floorf(q.x) + 1.0f;
*ptr_xy++ = SDL_floorf(q.y) + 1.0f;
*ptr_xy++ = SDL_floorf(q.x);
*ptr_xy++ = SDL_floorf(q.y) + 1.0f;
num_vertices += 4;
#define ADD_TRIANGLE(i1, i2, i3) \
*ptr_indices++ = cur_indice + i1; \
*ptr_indices++ = cur_indice + i2; \
*ptr_indices++ = cur_indice + i3; \
num_indices += 3; \
/* Draw the first, only if not a closed polyline */
if (i || is_looping == 0) {
ADD_TRIANGLE(4, 5, 6)
ADD_TRIANGLE(4, 6, 7)
}
/* first point */
if (i == 0) {
p = q;
cur_indice += 4;
continue;
}
/* draw segments */
if (p.y == q.y) {
if (p.x < q.x) {
ADD_TRIANGLE(1, 4, 7)
ADD_TRIANGLE(1, 7, 2)
} else {
ADD_TRIANGLE(5, 0, 3)
ADD_TRIANGLE(5, 3, 6)
}
} else if (p.x == q.x) {
if (p.y < q.y) {
ADD_TRIANGLE(2, 5, 4)
ADD_TRIANGLE(2, 4, 3)
} else {
ADD_TRIANGLE(6, 1, 0)
ADD_TRIANGLE(6, 0, 7)
}
} else {
if (p.y < q.y) {
if (p.x < q.x) {
ADD_TRIANGLE(1, 5, 4)
ADD_TRIANGLE(1, 4, 2)
ADD_TRIANGLE(2, 4, 7)
ADD_TRIANGLE(2, 7, 3)
} else {
ADD_TRIANGLE(4, 0, 5)
ADD_TRIANGLE(5, 0, 3)
ADD_TRIANGLE(5, 3, 6)
ADD_TRIANGLE(6, 3, 2)
}
} else {
if (p.x < q.x) {
ADD_TRIANGLE(0, 4, 7)
ADD_TRIANGLE(0, 7, 1)
ADD_TRIANGLE(1, 7, 6)
ADD_TRIANGLE(1, 6, 2)
} else {
ADD_TRIANGLE(6, 5, 1)
ADD_TRIANGLE(6, 1, 0)
ADD_TRIANGLE(7, 6, 0)
ADD_TRIANGLE(7, 0, 3)
}
}
}
p = q;
cur_indice += 4;
}
retval = QueueCmdGeometry(renderer, NULL,
xy, xy_stride, (int *) &renderer->color, 0 /* color_stride */, NULL, 0,
num_vertices, indices, num_indices, size_indices,
1.0f, 1.0f);
SDL_small_free(xy, isstack1);
SDL_small_free(indices, isstack2);
}
} else {
retval = QueueCmdDrawLines(renderer, points, count);
}
return retval < 0 ? retval : FlushRenderCommandsIfNotBatching(renderer);
}
It looks like lots of overdraw there. Does it work correctly with blend modes?
@slouken no! there are no overdraws, only non intersecting triangles, (unless the polyline cross itself). the only potential overdraws was the first point when the polyline is a loop, which I tested and fixed
actually, it the line does a very small angle this can count as an overdraw indeed? , not sure is this is correct or not
@rtrussell, I updated the pre-built DLLs, can you see if this works for your project?
actually, it the line does a very small angle this can count as an overdraw indeed? , not sure is this is correct or not
It's easier to test with SDL_BLENDMODE_ADD and look at the output pixels in an image editing program.
@rtrussell, I updated the pre-built DLLs, can you see if this works for your project?
On a quick check it's looking good. The testcase passes, and it's correctly responding to glLogicOp()
. I'll run it through some more tests and report back.
Everything I've thrown at it has worked. I've not noticed any overdraws, even when using exclusive-or plotting which is a tough test. Impressive.
The bad news is that there seems to be a massive performance hit. Using SDL_RenderDrawLines()
to draw a 500-pixel-diameter circle, consisting of 360 segments, seems to be taking something like 20 times longer with SDL 2.0.19 than 2.0.16.
If it can't be made faster, that suggests to me that the old method should continue to be used on platforms which have always worked reliably using it (e.g. Emscripten).
I don't really get the issue with glLogicOp()
, but also why is would be different between the two functions of this issue, since they are both based on the same triangle rendering.
Of course, it takes more time:
before: data used was: count * 2 * float
now: 4 * count * 2 * float
as vertices
indices, up to (2 + 4 + 2) *3 * (count - 1)
testdraw2, with manually edition, to only call DrawLine() : testdraw2 1000. before: 1000 fps, now: 700 fps testdraw2 10000. before: 356 fps, now: 74 fps testdraw2 30000. before: 140 fps, now: 25 fps not sure if this is CPU or GPU time though ...
Maybe I'll give a try later to draw less triangles and to rely on the top-left rule
I don't know whether I've chosen a particularly unfortunate benchmark, but running this program here (Windows 10, OpenGL backend) gives an approximately 20:1 slowdown in SDL 2.0.19 compared with SDL 2.0.16:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <math.h>
#include "SDL2/SDL.h"
#define SCREEN_WIDTH 640
#define SCREEN_HEIGHT 512
int main(int argc, char* argv[])
{
int i, running = 1;
int start, finish;
SDL_Event ev;
SDL_Window *window;
SDL_Renderer *renderer;
SDL_Point pt[361];
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER | SDL_INIT_EVENTS) != 0)
{
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR,
"Testcase", SDL_GetError(), NULL);
return 1;
}
SDL_SetHint(SDL_HINT_RENDER_DRIVER, "opengl");
window = SDL_CreateWindow("Testcase", SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
SCREEN_WIDTH, SCREEN_HEIGHT, SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE);
if (window == NULL)
{
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR,
"Testcase", SDL_GetError(), NULL);
SDL_Quit();
return 5;
}
renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED |SDL_RENDERER_PRESENTVSYNC);
if (renderer == NULL)
{
SDL_DestroyWindow(window);
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR,
"Testcase", SDL_GetError(), NULL);
SDL_Quit();
return 6;
}
for (i = 0; i <= 361; i++)
{
pt[i].x = 320 + 250.0 * cos(0.0174532 * i);
pt[i].y = 256 + 259.0 * sin(0.0174532 * i);
}
while (running)
{
while (SDL_PollEvent(&ev))
switch (ev.type)
{
case SDL_QUIT:
running = 0;
}
// Clear the window to white:
SDL_SetRenderDrawColor(renderer, 255, 255, 255, 255);
SDL_RenderClear(renderer);
// Draw 1000 green circles, diameter 500 pixels:
SDL_SetRenderDrawColor(renderer, 0, 255, 0, 255);
start = SDL_GetTicks();
for (i = 0; i < 1000; i++)
SDL_RenderDrawLines(renderer, pt, 361);
finish = SDL_GetTicks();
SDL_Log("Elapsed time %i ms\n", finish-start);
SDL_RenderPresent(renderer);
}
SDL_DestroyRenderer(renderer);
SDL_DestroyWindow(window);
SDL_Quit();
exit(0);
}
How does it perform if an explicit renderer backend isn't chosen? I think automatic batching is turned off by default when the SDL_HINT_RENDER_DRIVER
hint is set, right?
(I'm not suggesting a 20x slowdown in this situation isn't a problem, I'm just wondering if the performance difference changes depending on batching settings. It might also be interesting to test different backends in general. I kind of suspect the old OpenGL lines code was hitting a driver path that was effectively automatically batched inside the driver code on certain drivers, which might not happen when an array of triangles is rendered.)
I think automatic batching is turned off by default when the
SDL_HINT_RENDER_DRIVER
hint is set, right?
Correct. If I add:
SDL_SetHint("SDL_RENDER_BATCHING", "1");
and a SDL_RenderFlush()
before reading the finish time it improves performance in both cases, but the ratio is still about 20:1. This is a far greater difference than I would expect; can somebody else check in case my measurements are way off for some reason?
Edit: And if I remove the hints altogether (so the Direct3D backend is used) the ratio is similar.
Indeed, I also get 10ms before, and 200 ms after, with OpenGL if I switch to gles2, it's 70 ms
If I comment out what is in SDL_RENDERCMD_GEOMETRY for openGL: it is still 90ms (time in SDL_render.c, in QueueGeometry + set some state) for gles2: it is still 60s (time in SDL_render.c, in QueueGeometry + set some state)
so there are some time preparing the triangles data, which is bigger than just sending the points to the back-end. then in opengl, it requires many call:
{
1290 size_t j;
1291 float currentColor[4];
1292 data->glGetFloatv(GL_CURRENT_COLOR, currentColor);
1293 data->glBegin(GL_TRIANGLES);
1294 for (j = 0; j < count; ++j)
1295 {
1296 const GLfloat x = *(verts++);
1297 const GLfloat y = *(verts++);
1298
1299 const GLfloat r = *(verts++);
1300 const GLfloat g = *(verts++);
1301 const GLfloat b = *(verts++);
1302 const GLfloat a = *(verts++);
1303
1304 data->glColor4f(r, g, b, a);
1305
1306 if (texture) {
1307 GLfloat u = *(verts++);
1308 GLfloat v = *(verts++);
1309 data->glTexCoord2f(u,v);
1310 }
1311 data->glVertex2f(x, y);
1312 }
1313 data->glEnd();
1314 data->glColor4f(currentColor[0], currentColor[1], currentColor[2], currentColor[3]);
1315 }
compared to opengl drawline:
1261 data->glBegin(GL_LINE_STRIP);
1262 for (i = 0; i < count; ++i, verts += 2) {
1263 data->glVertex2f(verts[0], verts[1]);
1264 }
1265 data->glEnd();
Oh I (incorrectly) assumed the GL triangle rendering code was using glDrawArrays
. I wonder if that would help at all. And maybe finding a way to remove the need for that glGetFloatv
but I suspect that's not part of the bottleneck.
If you have access to a CPU profiler of any kind they can be really helpful for figuring out the best places to change stuff, too.
Yes gles2 is indeed using glDrawArrays
, not opengl, maybe that would improve performance
I give a try and did a lot of copy paste from the GLES2 backend to add the VBO to opengl (see #5089) So now the lines with triangles draws at 75 ms (but the old GL_LINE version draws slower now also because we throw more code inside ...)
some double-check:
"but the old GL_LINE version draws slower now also because we throw more code inside ..."
Actually It became slower in 2.0.18 (eg 35 ms), because we call SDL_cosf / sinf / atanf, per point in GL_QueueDrawLines to fix this issue. So now, we are more: 35 ms with draw Line with GL_LINES, and 75 ms (with the VBO PR #5083), if we use GL_TRIANGLES
I think OpenGL drivers tend to perform better with client-side vertex arrays (glVertexAttribPointer(..., directpointer)
/glColorPointer
/etc plus glDrawArrays
/glDrawElements
, with no vertex buffer object) than with VBOs+glBufferData/glBufferSubData, because the efficient ways to update VBOs are only part of OpenGL 4.4+.
We'll probably get better RenderGeometry performance overall by passing the indices data through to the GPU via glDrawElements
instead of manually expanding the triangle list on the CPU and using glDrawArrays
, too, especially if vertex data copies can be eliminated.
Something I'm struggling to understand is: What are the actual issues that OpenGL faces when drawing lines, compared to the other renderers?
If I remove the trig stuff added to the OpenGL renderer, it seems like the only issue I face (at least on my regular NVIDIA gpu) is with lines being one pixel too short, which is the same issue the other backends face (but all handle slightly differently for whatever reason): https://github.com/libsdl-org/SDL/blob/b7885abc449250fc733a756239d670394d01fe62/src/render/direct3d/SDL_render_d3d.c#L1215 https://github.com/libsdl-org/SDL/blob/b7885abc449250fc733a756239d670394d01fe62/src/render/direct3d11/SDL_render_d3d11.c#L2133 https://github.com/libsdl-org/SDL/blob/b7885abc449250fc733a756239d670394d01fe62/src/render/metal/SDL_render_metal.m#L1124 https://github.com/libsdl-org/SDL/blob/b7885abc449250fc733a756239d670394d01fe62/src/render/software/SDL_drawline.c#L201 It wouldn't surprise me if the other renderers (PSP, VitaGxm) also have the same behaviour, but just don't correct it.
This behaviour also corresponds to what the OpenGL spec expects, which states: "When pa and pb lie on fragment centers, this characterization of fragments reduces to Bresenham’s algorithm with one modification: lines produced in this description are “half-open,” meaning that the final fragment (corresponding to pb) is not drawn.".
The two main solutions to counteract half-open lines seems to be either to extend the line (Metal), or draw an extra point at the end (D3D, Software). If all renderers follow the same diamond-exit rule behaviour (which I suspect they do, apart from maybe the Software renderer), would it make more sense to just apply the half-pixel offset + line extension before calling QueueDrawLines?
See: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf#subsection.14.5.1 https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.4.3%20Aliased%20Line%20Rasterization%20Rules
It appears that when running on Windows with the OpenGL backend (not Direct3D)
SDL_RenderDrawLine()
is not plotting the final endpoint. So for example:SDL_RenderDrawLine(renderer, 200, 300, 300, 300);
is not plotting the pixel at 300,300 at all. This worked correctly in SDL 2.0.16.
The program below will demonstrate this; it should draw a green square without any black line, but with SDL 2.0.18 and Windows 10 it plots this: Commenting out the OpenGL hint results in correct behavior.