Closed allsey87 closed 6 years ago
This is a known issue. Currently I have no idea why it happens, and why only on this specific graphics card. I suspect that this is due to a funny interaction in systems that have dual cards, and that disabling one in the BIOS might help, but I haven't had the time to verify my suspicion.
Did this issue rear its head prior to switching to Qt5? I have no such problem on an older version (forked end of 2013) of ARGoS 3 with Qt4.
I've always had this issue, and I started using ARGoS in 2015. I only have an i915 (no dual card).
Also, as far as I'm aware, I don't have a secondary graphics card. If you suggest a couple of commits of interest where this issue may have come from, I will build and test them to confirm whether or not the issue is present.
The switch to Qt5 did not change the ARGoS code in any particular way. Most of that was just refactoring to adapt to the new OpenGL class. If your system does not have a double card, then I don't really know where to look. I agree that it must be something with Qt5 though... The problem is that I don't have a card I can test on - I'll ask a student if I can borrow his laptop for a few hours. For the commit, run a git log
on the folder of the OpenGL visualization and you'll see the entire history.
I was thinking along the lines that perhaps this has to do with a new version of QOpenGLWidget...
@beltrame so you have had this issue before @ilpincy moved to Qt5 at the beginning of 2017?
Is it possible to install a debug version of the driver, so we see where it explodes? Maybe that would shed some light.
Never tried anything like that before, and since I don't have a back up computer at the moment I'm not particularly keen. I just hacked qtopengl_box.cpp by adding c_visualization.DrawBoundingBox(c_entity.GetEmbodiedEntity());
after c_visualization.DrawEntity(c_entity.GetEmbodiedEntity());
to draw the bounding box without selecting it... works fine? although, as soon as I select, out comes the smoke...
As a further test, I completely commented out this block code out from qtopengl_widget.cpp
:
if(m_sSelectionInfo.IsSelected) {
glPushMatrix();
CallEntityOperation<CQTOpenGLOperationDrawSelected, CQTOpenGLWidget, void>(*this, *vecEntities[m_sSelectionInfo.Index]);
glPopMatrix();
}
Exact same behavior, draws the bounding box fine, but as soon as I select, out comes the smoke...
It might be my way of managing selection - maybe getting in and out of selection mode I confuse the driver. However, the code is pretty straightforward and I followed an existing example almost verbatim...
Can you try playing with the code here: https://github.com/ilpincy/argos3/blob/master/src/plugins/simulator/visualizations/qt-opengl/qtopengl_widget.cpp#L364 ?
I'll have a look, just rebuilding with debugging symbols on... What is really interesting is I completely disabled both calls to the CQTOpenGLOperationDrawX
entity operations. Same issue when I select, which suggests that the segfault occurring at that point was a coincidence and the fault actually came from a different thread? (I'm a bit (read completely) inexperienced with debugging multi-threaded programs in gdb).
Funny you should point out that code, that is exactly what I had my eye set on to play with next.
If you look at the backtrace, the segfault happens when ARGoS draws a robot with the normal model while in the method SelectInScene(). So the issue is really rendering in SELECT mode... is it possible that the driver does not support drawing something in that mode?
@allsey87 Yes, I had the problem before Qt5, and on two different computers/distributions (Linux Mint and OpenSuSE, two flavours of Intel Graphics).
Ok, so by removing the calls to makeCurrent
and doneCurrent
in the SelectInScene
method I am able to select and move around objects without segfaults. However, it is quite difficult to select things. It seems as if either (i) only some faces of the box in my example are selectable or (ii) there is a disagreement between between what is drawn in the select buffer and what is drawn on screen.
changes.txt shows a diff from the latest master, as you can see I haven't changed much.
For the sake of keeping a record of the testing, as @ilpincy suggested, this segfault occurs while drawing in select mode with the calls to makeCurrent
and doneCurrent
enabled. However, it isn't the drawing of a specific primitive, both GL_QUADS
and GL_POINTS
will fail at some point towards the end of a drawing function. The exact point at where the segfault occurs is difficult to locate, it seems to change depending on what is in the drawing function.
Great! The problem is that removing makeCurrent()
and doneCurrent()
does not work on Mac (the window becomes corrupted) nor in my Ubuntu 16.04 VirtualBox VM (same issue, but maybe it's because I run on Mac). I can solve it with conditional compilation, but I'd like to understand the issue better before proceeding.
Testing this on a new laptop with a both NVIDIA and Intel graphics. It seems that both graphics drivers are loaded (?), however, OpenGL is using the Intel graphics driver. This conclusion is based on the output of glxinfo | grep -i vendor
which returns:
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
Vendor: Intel Open Source Technology Center (0x8086)
OpenGL vendor string: Intel Open Source Technology Center
As such, this laptop is also using the i915 driver and segfaults upon attempting to select an object. Removing the makeCurrent()
and doneCurrent()
function calls stops the segfaults, however, the selection mechanism still feels quite broken. Selecting object is difficult and can only be done from certain perspectives / by clicking on a subset of pixels, which represent only part of the object. I observed similar behavior on my other laptop.
@ilpincy I can also now install a debug driver on one of my laptops (the one that only has Intel graphics) and get more information. I have found that the following package is available for my system, however, I can't find much in the way of documentation regarding how to load and use it.
X.Org X server -- Intel i8xx, i9xx display driver (debug symbols)
This driver provides support for the Intel i8xx and i9xx family of chipsets, including i810, i815, i830, i845, i855, i865, i915, and i945 series chips.
This package provides debugging symbols for this Xorg X driver.
Thoughts? Perhaps I need to create the file /usr/share/X11/xorg.conf.d/20-intel.conf
as described here and set the driver field to something like intel-dbg
. In the example, the string intel
is used although the driver name as reported by lshw
and lsmod
is i915
@allsey87 Thanks for offering your help with this issue. I think that adding debugging symbols would help shed some light. We could send a nice bug report to the driver developers once we understand what goes wrong.
I tried removing makeCurrent() ... doneCurrent()
from the code, but it corrupts the graphics window on every computer I tried.
Out of interest, what do you observe when the graphics window is corrupted?
Half of the screen is black or covered by a random pattern.
@allsey87 I have this issue on my laptop as well, but when I select the robots from the Buzz Debugger it does not crash.
I have an test repository that you can use to replicate the process yourself.
Crash has the relevant tests. It causes a buzz runtime error so that the robots are selectable from the buzz debugger.
@ilpincy like this?
This is with the makeCurrent()
and doneCurrent()
commented out. Somehow I only saw this for the first time today. The only major difference I can think of was that I was using the foot-bot and the dynamics2d engine for a simple demo...
I will make a note here that this black pattern only appears when I go to select something. Furthermore, I can select objects in the simulation, however, everything appears to be offset by a constant value. That is, I can find X,Y coordinates on the screen that correspond to where the object is drawn with respect to the selection buffer. It seems that the selection buffer and drawing buffer are just misaligned...
I'm going to allocate some time to solving this bug once and for all. I will document the steps that I have taken and would really appreciate any feedback or comments.
I am starting with a clean / up to date version ARGoS without any of my extensions. My first step was to get messages out of OpenGL to learn more about the nature of the segfault. My approach was to create and connect an instance of QOpenGLDebugLogger to the ARGoS Log. I did this by placing the following code at the bottom of void CQTOpenGLWidget::initializeGL()
if(m_pcOpenGLLogger == nullptr) {
m_pcOpenGLLogger = new QOpenGLDebugLogger(this);
if(!m_pcOpenGLLogger->initialize()) {
LOGERR << "Could not initialize QOpenGLDebugLogger" << std::endl;
delete m_pcOpenGLLogger;
m_pcOpenGLLogger = nullptr;
}
else {
LOGERR << "Initialized QOpenGLDebugLogger" << std::endl;
connect(m_pcOpenGLLogger,
&QOpenGLDebugLogger::messageLogged,
[=](const QOpenGLDebugMessage& c_message) {
if(c_message.severity() == QOpenGLDebugMessage::HighSeverity) {
LOGERR << "[WARNING] " + c_message.message().toStdString() << std::endl;
}
else {
LOG << "[INFO] " + c_message.message().toStdString() << std::endl;
}
}
);
m_pcOpenGLLogger->startLogging(QOpenGLDebugLogger::SynchronousLogging);
}
}
This initialized correctly and outputed messages to the ARGoS log, note that I disabled redirecting the logs to the GUI by commenting out the lines m_pcLogStream = new CQTOpenGLLogStream(LOG.GetStream(), m_pcDockLogBuffer);
and m_pcLogErrStream = new CQTOpenGLLogStream(LOGERR.GetStream(), m_pcDockLogErrBuffer);
in qtopengl_main_window.cpp. Unfortunately, the output from this is quite boring and doesn't seem to show anything of interest. When I select an entity, there is no output prior to the segfault as shown in QOpenGLDebugLogger.txt.
My second attempt was using a debugging tool called RenderDoc, this tool seems very powerful and straightforward to use. I was able to start ARGoS, however, no valid API was detected. This was because RenderDoc only supports OpenGL 3.2+ core profile and it seems that although I have OpenGL 4.5 available on my machine, the current configuration of OpenGL, Qt, and ARGoS uses 3.0, or at least that is what is reported by the following code:
GLint major; GLint minor;
glGetIntegerv(GL_MAJOR_VERSION, &major);
glGetIntegerv(GL_MINOR_VERSION, &minor);
LOG << "OpenGL version " << major << "." << minor << std::endl;
Note that the output of glxinfo | grep OpenGL
on my machine is as follows:
OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) Haswell Mobile OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.0.7 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 17.0.7 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.0.7 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 OpenGL ES profile extensions:
This aside, I was able to force OpenGL, Qt, and ARGoS to use version 4.5 as reported by the same code that originally reported 3.0 by adding the following lines of code before the line m_pcMainWindow = new CQTOpenGLMainWindow(m_tConfTree);
in qtopengl_render.cpp
QSurfaceFormat format;
format.setDepthBufferSize(24);
format.setStencilBufferSize(8);
format.setVersion(3, 2);
format.setProfile(QSurfaceFormat::CoreProfile);
QSurfaceFormat::setDefaultFormat(format);
This recompiles fine and now connects properly to RenderDoc, however, the QtOpenGLWidget just displays the background color and nothing else. So at this point, I have a couple questions:
Any thoughts @ilpincy?
Before giving up on the QOpenGLDebugLogger, I would try to use printf()
rather than LOG
. LOG
is buffered per thread and ARGoS explicitly has to call LOG.Flush()
to print anything. It is normal that you get no output prior to the crash. Instead, if you use printf()
and make sure to add a endline after each message, you'll see the messages for sure.
As for the second question, I don't know exactly what to say, since it's Qt code.
Thanks @ilpincy, I will have another go using fprintf(stderr,...) to out the errors since I think printf also has buffering.
Since RenderDoc wasn't working with the older version of OpenGL and since the OpenGL widget didn't render when I requested Qt to use a later version I attempted to use another OpenGL debugging tool called BuGLe, this tool hasn't been developed for a while but I was able to get it working with a few tweaks to the source code.
There were two filter sets that I tried, the first created a log that traced all the OpenGL calls. I started ARGoS, attempted to select an e-puck, got a segfault, and a 172MB log file was produced! I had a look at this file using combinations of grep -C 10 -ni
to search strings such as err or warn and found nothing of interest. Tail also didn't show anything of interest at the end of the log file. If anyone wants this file, let me know and I will put up on dropbox. The file is basically just a trace of all the OpenGL functions called e.g.
940-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 227) = "GL_SGIS_texture_border_clamp" 941-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 228) = "GL_SGIS_texture_edge_clamp" 942-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 229) = "GL_SGIS_texture_lod" 943-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 230) = "GL_SUN_multi_draw_arrays" 944-[INFO] trace.call: glGetBooleanv(GL_FRAMEBUFFER_SRGB_CAPABLE_EXT, 0x7ffc07ff4ca0 -> GL_FALSE) 945:[INFO] trace.call: glGetError() = GL_NO_ERROR 946-[INFO] trace.call: glXGetProcAddressARB("glGetStringi") = 0x7fd2e37e1bc0 947-[INFO] trace.call: glGetIntegerv(GL_NUM_EXTENSIONS, 0x7ffc07ff4b48 -> 231) 948-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 0) = "GL_3DFX_texture_compression_FXT1" 949-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 1) = "GL_AMD_conservative_depth" 950-[INFO] trace.call: glGetStringi(GL_EXTENSIONS, 2) = "GL_AMD_draw_buffers_blend"
The other method involves using BuGLe with gdb. The idea is to recover backtraces from segmentation faults inside the driver, even if the driver is compiled without symbols. I ran this a couple times and got the following backtraces: 1, 2, 3, 4. Again, nothing too interesting here I think, other than the crash seems to point to either glCallList()
from argos::CQTOpenGLEPuck::Draw(argos::CEPuckEntity&)
or from glPointSize
from argos::CQTOpenGLWidget::DrawRays(argos::CControllableEntity&)
. On a side note, it seems that drawing the rays from the controllable entities in the selection buffer may be a minor bug.
This is the updated code for using QOpenGLDebugLogger, I added this code directly beneath the call to initializeOpenGLFunctions();
in void argos::CQTOpenGLWidget::initializeGL()
putting this code any earlier causes a Qt assertion to fail.
GLint major; GLint minor;
glGetIntegerv(GL_MAJOR_VERSION, &major);
glGetIntegerv(GL_MINOR_VERSION, &minor);
::fprintf(stderr,"OpenGL version %d.%d\n", major, minor);
if(m_pcOpenGLLogger == nullptr) {
m_pcOpenGLLogger = new QOpenGLDebugLogger(this);
if(!m_pcOpenGLLogger->initialize()) {
::fprintf(stderr,"Could not initialize QOpenGLDebugLogger\n");
delete m_pcOpenGLLogger;
m_pcOpenGLLogger = nullptr;
}
else {
::fprintf(stderr,"Initialized QOpenGLDebugLogger\n");
connect(m_pcOpenGLLogger,
&QOpenGLDebugLogger::messageLogged,
[=](const QOpenGLDebugMessage& c_message) {
::fprintf(stderr,"%s\n",c_message.message().toStdString().c_str());
}
);
m_pcOpenGLLogger->startLogging(QOpenGLDebugLogger::SynchronousLogging);
}
}
I then proceeded to recompile, run ARGoS and select an epuck resulting in the segfault. A couple of additional lines appeared after I attempted to select and are shown in output.txt. The four messages and the segfault message after the blank line occurred following the segfault.
Moving forwards, I think the next step is to install the debugging symbols for graphics card such that I can get a closer look at the source of the segfault.
Thread 1 "argos3" received signal SIGSEGV, Segmentation fault. 0x00007fffc6886c07 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
allsey87@ThinkPad-T540p:~/Workspace/argos4$ ll /usr/lib/x86_64-linux-gnu/dri/i965_dri.so -rw-r--r-- 5 root root 7405120 Jun 8 09:54 /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
dpkg shows the following:
allsey87@ThinkPad-T540p:~/Workspace/argos4$ dpkg -S /usr/lib/x86_64-linux-gnu/dri/i965_dri.so libgl1-mesa-dri:amd64: /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
However, I am unable to continue at this point as I am not sure how to find the corresponding debug symbols package for libgl1-mesa-dri:amd64. Any suggestions @ilpincy?
Thanks for all the work! You basically got to the same point I got stuck, too. :(
I think Ubuntu should have a debug version of the mesa package, which installs the symbols you need to debug. Another possibility is to recompile the driver with debugging symbols on. Being a kernel driver I haven't dared to adventure beyond this point.
Having tried the code on several NVIDIA and Intel cards, and across platform, I really think the Intel driver has a bug. If not, something bad would happen in other cards too. Not a crash, maybe, but at least some sort of error state. When I tried on other computers, though, I never found anything.
The main issue I have with your conclusion that the Intel driver has a bug is that the problem disappears with Qt4. Just moments ago I confirmed this by installing Qt4 on my laptop (which removed Qt5) and building ARGoS based on commit https://github.com/ilpincy/argos3/commit/627ce753ee74d85bf77d16cd4fbd049dde32bb5f, the parent of https://github.com/ilpincy/argos3/commit/cc658433552314678513efd9e77e5989b85a66a6 where you added Qt5 for the first time. In this version based on Qt4, I am able to select as many epucks as I want!
I am going to have a go at building mesa driver now using the instructions over at 01.org. Perhaps this will get us closer to the source. I suspect that the QOpenGLWidget in Qt5 is using some extension that is not supported by the mesa implementation.
Michael Allwright writes:
https://github.com/ilpincy/argos3/commit/cc658433552314678513efd9e77e5989b85a66a6 where you added Qt5 for the first time. In this version based on Qt4, I am able to select as many epucks as I want!
I made a test and I confirm it works without a hitch with this version. So, something happened with Qt5?
Giovanni Beltrame, PhD, ing. MIST Lab - mistlab.ca Ecole Polytechnique de Montreal Visiting Professor - University of Tübingen
Just other results from testing after reinstalling Qt5:
Steps taken to get a copy of i965_dri.so with debug symbols:
glxinfo | grep Mesa
reports that I am using Mesa 17.0.7 on my system, so I am going to download the same version from ftp://ftp.freedesktop.org/pub/mesa/mesa-17.0.7.tar.gz./configure --prefix=/usr --enable-driglx-direct --enable-gles1 --enable-gles2 --enable-glx-tls --with-dri-driverdir=/usr/lib/dri --with-egl-platforms='drm x11' --with-dri-drivers=i965 --without-gallium-drivers --enable-debug
make
And it worked! Not only did my laptop boot again, it also has given me a nice detailed backtrace from the segfault after selecting an e-puck in ARGoS. Here are the backtraces from a couple different runs:
backtrace.1.txt, backtrace.2.txt, backtrace.3.txt, backtrace.4.txt, backtrace.5.txt.
Looking closer at the code at intel_mipmap_tree.c:2425 we have:
if (src->stencil_mt) {
brw_blorp_blit_miptrees(brw,
src->stencil_mt, 0 /* level */, 0 /* layer */,
src->stencil_mt->format, SWIZZLE_XYZW,
dst->stencil_mt, 0 /* level */, 0 /* layer */,
dst->stencil_mt->format,
0, 0,
src->logical_width0, src->logical_height0,
0, 0,
dst->logical_width0, dst->logical_height0,
GL_NEAREST, false, false /*mirror x, y*/,
false, false /* decode/encode srgb */);
}
the issue is dst->stencil_mt
is NULL
for some reason, so trying to dereference and access the format
field is the source of our segfault. I inserted a breakpoint at intel_mipmap_tree.c:2425 and can confirm that this code path is only executed when trying to select something.
That is fantastic work Michael! Thank you so much!
No problem! Let's get this bug solved!
Focusing on what happens after entering SelectInScene: the main difference between the old version of ARGoS based on Qt4 (https://github.com/ilpincy/argos3/commit/627ce753ee74d85bf77d16cd4fbd049dde32bb5f) and the newer version of ARGoS based on Qt5, with respect to the driver, is when we select something we end up in the following block of code:
void
intel_miptree_updownsample(struct brw_context *brw,
struct intel_mipmap_tree *src,
struct intel_mipmap_tree *dst)
{
brw_blorp_blit_miptrees(brw,
src, 0 /* level */, 0 /* layer */,
src->format, SWIZZLE_XYZW,
dst, 0 /* level */, 0 /* layer */, dst->format,
0, 0,
src->logical_width0, src->logical_height0,
0, 0,
dst->logical_width0, dst->logical_height0,
GL_NEAREST, false, false /*mirror x, y*/,
false, false);
if (src->stencil_mt) {
brw_blorp_blit_miptrees(brw,
src->stencil_mt, 0 /* level */, 0 /* layer */,
src->stencil_mt->format, SWIZZLE_XYZW,
dst->stencil_mt, 0 /* level */, 0 /* layer */,
dst->stencil_mt->format,
0, 0,
src->logical_width0, src->logical_height0,
0, 0,
dst->logical_width0, dst->logical_height0,
GL_NEAREST, false, false /*mirror x, y*/,
false, false /* decode/encode srgb */);
}
}
The difference between the new and old versions of ARGoS / Qt is that for the older version both src->stencil_mt
and dst->stencil_mt
were NULL
so the second call to brw_blorp_blit_miptrees
was never made due to the if statement and dst->stencil_mt
was never defeferenced...
@ilpincy, I think at this point it would be useful to have a discussion about how you have configured the QOpenGLWidget for ARGoS. In particular, I would like to know which version and profile of OpenGL are we targeting with ARGoS. When I try to configure ARGoS to use OpenGL 4.3 / core profile, I get numerous warnings/errors about unsupported extensions, deprecated API etc:
void CQTOpenGLMainWindow::CreateOpenGLWidget(TConfigurationNode& t_tree) {
/* Create the surface format */
QSurfaceFormat cFormat = QSurfaceFormat::defaultFormat();
cFormat.setDepthBufferSize(24);
cFormat.setMajorVersion(4);
cFormat.setMinorVersion(3);
cFormat.setSamples(4);
cFormat.setProfile(QSurfaceFormat::CoreProfile);
/* Create the widget */
QWidget* pcPlaceHolder = new QWidget(this);
m_pcOpenGLWidget = new CQTOpenGLWidget(pcPlaceHolder, *this, *m_pcUserFunctions);
m_pcOpenGLWidget->setFormat(cFormat);
...
}
}
Mesa: 6670 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_ENUM in glDisable(GL_LIGHTING)
GL_INVALID_ENUM in glDisable(GL_LIGHTING)
Mesa: User error: GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
Mesa: 4 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_ENUM in glEnable(GL_LIGHTING)
So at the moment, I suspect this issue is due to a difference in the OpenGL versions and profiles that are being used by different drivers, since in the code we are not explicitly specifying what we want to use¹. The fact that this works with NVIDIA drivers² or under OS X, could be that the default version / profile of OpenGL and the default flags and values in QSurfaceFormat as returned/set by the driver just happen to work and that this is not sufficient for the Intel drivers / Mesa implementation of OpenGL.
I think solving this bug could be as simple as explicitly specifying the correct profile / version of OpenGL in the code.
¹ This idea is based on this post: https://forum.qt.io/post/229442 ² Out of interest, when you say this works with NVIDIA graphics cards, are referring to using the open source Nouveau driver (which uses Mesa) or the official closed sourced driver from NVIDIA?
I think you might be on to something! When I say that it works, I mean that
I don't have a specific version of OpenGL in mind. Ideally, it would be the version that corresponds to the minimum code editing. :-)
As for meeting: I am currently travelling, and will be back on November 16th. Maybe we can setup a meeting on Thursday at 10am EST (4pm in Brussels)?
Thanks again for all the work you're doing. As I can't reproduce this bug on my own, what you're doing is truly valuable.
As for meeting: I am currently travelling, and will be back on November 16th. Maybe we can setup a meeting on Thursday at 10am EST (4pm in Brussels)?
This works for me.
I don't have a specific version of OpenGL in mind. Ideally, it would be the version that corresponds to the minimum code editing. :-)
I'll look into which API calls are resulting in the GL_INVALID_OPERATION
messages. I think we should aim to support OpenGL 4.5 since it was released in 2014 and should have good support assuming reasonably up to date graphics drivers.
@ilpincy these are a couple articles I have been looking at. I think although the qtopengl visualisation is partially working, I have a hunch (unless I am missing something) that the way OpenGL is being initialized at the moment is flawed or at least only valid on OS X.
In fact, there is a strong recommendation in the qt docs that says we shouldn't do, what it seems like we are doing:
When making OpenGL function calls, it is strongly recommended to avoid calling the functions directly. Instead, prefer using QOpenGLFunctions (when making portable applications) or the versioned variants (for example, QOpenGLFunctions_3_2_Core and similar, when targeting modern, desktop-only OpenGL). This way the application will work correctly in all Qt build configurations, including the ones that perform dynamic OpenGL implementation loading which means applications are not directly linking to an GL implementation and thus direct function calls are not feasible.
Actually, searching on this page for functions like glPushMatrix, glMaterialfv etc we will definitely have to use one of the compatibility contexts of OpenGL, if we want to avoid completely rewriting the visualization plugin.
EDIT: After further reading and considering the implementation and API constraints, we basically can only support OpenGL 2.1
@ilpincy using a really nice program called apitrace I was able to get the exact state of OpenGL on the call to glSelectBuffer right before glRenderMode(SELECT) on both the Qt4 and Qt5 versions of ARGoS. This is the output of diffing the two states using the command apitrace diff-state argos3-qt4-2568.json argos3-qt5-5176.json
{
framebuffer: {
},
parameters: {
GL_BACK: {
GL_SHININESS: 100 -> 0,
},
GL_COLOR_CLEAR_VALUE: [
0,
0.5019608 -> 0.5,
0.5019608 -> 0.5,
1
],
GL_CURRENT_COLOR: [
0 -> 1,
0,
0,
1
],
GL_DOUBLEBUFFER: "GL_TRUE" -> "GL_FALSE",
GL_DRAW_BUFFER: "GL_BACK" -> "GL_COLOR_ATTACHMENT0",
GL_DRAW_BUFFER0: "GL_BACK" -> "GL_COLOR_ATTACHMENT0",
GL_DRAW_FRAMEBUFFER: null -> {
GL_COLOR_ATTACHMENT0: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 1,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 0
},
GL_DEPTH_ATTACHMENT: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 24,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 2,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 8
},
GL_STENCIL_ATTACHMENT: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 24,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 2,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 8
}
},
GL_DRAW_FRAMEBUFFER_BINDING: 0 -> 1,
GL_FRAMEBUFFER_SRGB_CAPABLE_EXT: "GL_TRUE" -> "GL_FALSE",
GL_FRONT: {
GL_SHININESS: 100 -> 0,
},
GL_GENERATE_MIPMAP_HINT: "GL_NICEST" -> "GL_DONT_CARE",
GL_LIGHT0: {
GL_AMBIENT: [
0.1 -> 0.2,
0.1 -> 0.2,
0.1 -> 0.2,
1
],
GL_DIFFUSE: [
0.6 -> 0.8,
0.6 -> 0.8,
0.6 -> 0.8,
1
],
GL_POSITION: [
49.00039 -> 50,
13.95015 -> 50,
-49.57487 -> 2,
1
],
},
GL_LIGHT1: {
GL_AMBIENT: [
0.1,
0.1,
0.1,
1
],
GL_CONSTANT_ATTENUATION: 1,
GL_DIFFUSE: [
0.6,
0.6,
0.6,
1
],
GL_LINEAR_ATTENUATION: 0,
GL_POSITION: [
-49.01999,
-10.31277,
49.43683,
1
],
GL_QUADRATIC_ATTENUATION: 0,
GL_SPECULAR: [
0,
0,
0,
1
],
GL_SPOT_CUTOFF: 180,
GL_SPOT_DIRECTION: [
0,
0,
-1
],
GL_SPOT_EXPONENT: 0
} -> null,
GL_READ_BUFFER: "GL_BACK" -> "GL_COLOR_ATTACHMENT0",
GL_READ_FRAMEBUFFER: null -> {
GL_COLOR_ATTACHMENT0: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 1,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 8,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 0
},
GL_DEPTH_ATTACHMENT: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 24,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 2,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 8
},
GL_STENCIL_ATTACHMENT: {
GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING: "GL_LINEAR",
GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE: "GL_UNSIGNED_NORMALIZED",
GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE: 24,
GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME: 2,
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE: "GL_RENDERBUFFER",
GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE: 0,
GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE: 8
}
},
GL_READ_FRAMEBUFFER_BINDING: 0 -> 1,
GL_RENDERBUFFER_BINDING: 0 -> 2,
GL_SAMPLES: 0 -> 4,
GL_SAMPLE_BUFFERS: 0 -> 1,
GL_SCISSOR_BOX: [
0,
0,
320 -> 100,
240 -> 100
],
GL_SELECTION_BUFFER_POINTER: 94193257724264 -> 94176281598552,
GL_TEXTURE0: {
GL_TEXTURE_2D: {
GL_GENERATE_MIPMAP: "GL_TRUE" -> "GL_FALSE",
GL_TEXTURE_ALPHA_SIZE: 0 -> 8,
GL_TEXTURE_ALPHA_TYPE: "GL_ZERO" -> "GL_UNSIGNED_NORMALIZED",
GL_TEXTURE_IMMUTABLE_FORMAT: "GL_FALSE" -> "GL_TRUE",
GL_TEXTURE_IMMUTABLE_LEVELS: 0 -> 10,
GL_TEXTURE_INTERNAL_FORMAT: "GL_RGB" -> "GL_RGBA8",
GL_TEXTURE_MAX_LEVEL: 1000 -> 9,
GL_TEXTURE_VIEW_NUM_LAYERS: 0 -> 1,
GL_TEXTURE_VIEW_NUM_LEVELS: 0 -> 10,
},
GL_TEXTURE_BINDING_2D: 145 -> 1,
},
},
}
So, based on my last post I and after digging a bit more deeply I have reached the following conclusions:
To this end, I will not fix the bug but rather propose an alternative means of selecting entities based on color-picking. I have put a rough draft of this code over in the following repo: https://github.com/allsey87/argos3-selection-proposal - although it is incomplete and there are some minor glitches, I believe this is already working quite well (no segfaults!) and I invite you to test it (using src/testing/experiment/test_selection.argos
) and to give me feedback.
In a nutshell, this works by rendering everything without GL_LIGHTING
inside a QOpenGLFramebufferObject
using a color based on the entity's index in CSpace::GetRootEntityVector()
. The following in a snapshot shows what the render into the selection framebuffer looks like.
For the moment, I've added a new drawing method called CQTOpenGLOperationDrawSilhouette
although it may be possible to work around this using the existing draw method assuming all entities only use glMaterialfv
for normal rendering, since glMaterialfv
has no effect when GL_LIGHTING
is disabled.
Some final notes: our visualization plugin is built upon a lot of deprecated functionality and uses the outdated fixed pipeline approach for rendering (which has been completely removed since OpenGL 3.1). I think issues like this may continue to appear as vendors test the old versions of OpenGL and fixed pipeline functionality less and less and focus more on the programmable pipeline approach. Furthermore, I think that programming at the OpenGL level is too low level and beyond the scope of ARGoS and is a waste of our time. At some point, I strongly feel we need to move to using either a 3rd party renderer like Horde3D or the modern approach of simply having ARGoS pipe everything to WebGL based renderer like Babylon.js. The latter would have interesting implications in a cluster based set up where the user could locally render and monitor various instances of ARGoS in their web browser.
Wow, thanks a lot for all this work! Really impressive! :-O
I do agree that we should move to a more modern visualization, especially one that allows for models to be imported in a simple way. I have wanted to do it since forever, but never found the time/help to make it happen.
I had a few cracks at Horde3D myself and it would be choice too - I even have some initial prototype code done.
I never though about the option of using WebGL, and it's a really good idea. Love it!
So, how do we proceed? I'd like to have this done. I'll try to find a good student willing to help. If you have any suggestions, I'm all ears.
@ilpincy, @beltrame, and @cjcormier: when you get a chance could test out my solution / work around in: https://github.com/allsey87/argos3-selection-proposal
If you could let me know whether this works and any additional info like OS, graphics card / driver, OpenGL version etc that would be very helpful. I can then finalise this work around, submit it, and close off this bug.
Michael Allwright writes:
@ilpincy, @beltrame, and @cjcormier: when you get a chance could test out my solution / work around in: https://github.com/allsey87/argos3-selection-proposal If you could let me know whether this works and any additional info like OS, graphics card / driver, OpenGL version etc that would be very helpful. I can then finalise this work around, submit it, and close off this bug.
For me there's no crash, but I can't select robots either. I'm using OpenSuSE Tumbleweed, Intel HD 620 with i915 driver, GLX 1.4, OpenGL core 4.5.
Giovanni Beltrame, PhD, ing. MIST Lab - mistlab.ca Ecole Polytechnique de Montreal Visiting Professor - University of Tübingen
@beltrame thanks for testing this, I have only implemented the selection for boxes at the moment. The test file is src/testing/experiment/test-selection.argos
. are you able to select and move the boxes?
Hi,
Referring to gdbbacktrace.txt, it seems when I select a box in ARGoS I get a segfault from somewhere inside i965_dri.so (my intel graphics driver).
It is a bit strange as one of the last calls before the trace disappears inside graphics driver is to CQTOpenGLOperationDrawBoxNormal, which should always run on each frame. I would have expected the segfault to come after a CQTOpenGLOperationDrawBoxSelected.
I have tested this on a vanilla clone of this repo, checked out on the same date as this post. I am running Ubuntu Gnome 16.04 LTS, this is my
uname -a
outputFor details about the driver in use see: module-info-i915.txt