shairai / angleproject

Automatically exported from code.google.com/p/angleproject
Other
0 stars 0 forks source link

ARB_timer_query #142

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
A subtask for chromium issue to support ARB_timer_query in WebGL:
http://code.google.com/p/chromium/issues/detail?id=79060

Implement ARB_timer_query
http://www.opengl.org/registry/specs/ARB/timer_query.txt

Original issue reported on code.google.com by scheib@chromium.org on 13 Apr 2011 at 4:44

GoogleCodeExporter commented 9 years ago
Notes from an initial patch from Ben Vanik on the chromium issue follow, and 
attached is his patch. Ben, perhaps you can use gcl upload to send it to the 
codereview server? 

---
Hacked together a simple dummy extension in ANGLE - ANGLE_timer_query, that 
closely follows ARB_timer_query minus the glGet(GL_TIMESTAMP) call, which I 
can't figure out how to implement on D3D - mainly because I'm not sure exactly 
what it does in GL.

It shows that the extension can be implemented with a minimal amount of code on 
Windows.
You should be able to apply the attached diff against ANGLE r611 and run 
Simple_VertexShader - if you place a breakpoint in Draw() timeElapsed and 
timeElapsed2 will be some value (measured in ns).

So.... comments? Some big questions about this piece of work:
- Is the ARB_timer_query API rich enough? (I think so)
- Is the ARB_timer_query API small enough to be implemented on most systems? 
(this shows Windows and it's available on Linux)
- What should the ANGLE extension be called? (the behavior is not quite the 
same as ARB_timer_query, but close, which is why I went with ANGLE)
---

Original comment by scheib@chromium.org on 13 Apr 2011 at 4:50

Attachments:

GoogleCodeExporter commented 9 years ago
Regarding glGet(GL_TIMESTAMP):
Can we issue a timestamp query, stall on the result, and return that?

A crazier idea:
  Track an offset between a CPU time read and GPU. For each query issued, we can note the CPU timer value when it was issued. Upon reading the query result we can test that it was not less than our current offset - and if it is we update the offset. At ANGLE init time, or on the first glGet(timestamp) we can push a minimal workload at the GPU to initialize the offset.
  On get(timestamp) calls we poll the CPU time and return it plus the offset. Should keep things mostly related - but if the CPU timer drifts slower than GPU the gap will grow without us detecting it.

Original comment by scheib@chromium.org on 13 Apr 2011 at 5:25

GoogleCodeExporter commented 9 years ago
As I read it, glGet(Gl_TIMESTAMP) is QueryPerfCounter() [eg cpu time] reported 
against the GPU timebase. Is this others' interpretation as well?

If so, I agree, a rough solution is to synchronize the timebases at ANGLE init 
by something like:
   uint64_t startCPU = getTime();
   IDirect3DQuery9* query = ...
   query->Issue(flush flag so d3d doesn't hang);
   uint64 queryTimeGPU;
   while(query->GetData(&queryTimeGPU))
     sleep;
   uint64 endCPU = getTime();
   uint64 middleGPU = startCPU + (endCPU - startCPU) * 0.5;

Where middleGPU and queryTimeGPU are considered to be equal and become the 
timebase used for conversion between the two time domains.

Original comment by nd...@chromium.org on 13 Apr 2011 at 5:40

GoogleCodeExporter commented 9 years ago
Hmm I thought about this, but I'm unsure that's actually what the call is 
supposed to do.

'This will return the GL time after all previous commands have reached the GL 
server but have not yet necessarily executed.'

This means that doing a synchronous timer query at the time of the Get won't 
work, as that'll mean that all the previous commands have been *executed*. What 
it wants is the time that all of the commands are in the driver (so that you 
can measure latency).

The last example in the spec shows this flow:
glFlush();
glDrawArrays(...) // #0
glDrawArrays(...) // #1
...
glDrawArrays(...) // #N
glGet(GL_TIMESTAMP, &time)
// time = the time at which draw #0 hit the GPU, NOT draw #N

The extension spec notes that the call is synchronous but that it does not 
stall the GL pipeline (meaning that it doesn't wait on execution of the draws).

So I think you could implement this by:
// Always issue a query after a flush/etc:
glQueryCounter(dummyQuery)
glDrawArrays(...) // #0
glDrawArrays(...) // #1
...
glDrawArrays(...) // #N
// User calls glGet:
glGet(GL_TIMESTAMP, &time)
  glFlush()
  glGetQueryObjectui64v(dummyQuery, GL_QUERY_RESULT, &time);
  glQueryCounter(dummyQuery) // start again for the next glGet
  // time = the time of draw #0

All that said, Nat's synchronous flush would probably be sufficient behavior 
for a debugger, but not for an application wanting to use this in normal flow 
as it synchronizes the CPU to the GPU.

Original comment by ben.vanik on 13 Apr 2011 at 5:53

GoogleCodeExporter commented 9 years ago
Oh, I used the nvidia QuerySample as a reference for some of this info, as it's 
one of the only documented pieces of code out there on the D3D TIMESTAMP stuff:
ftp://download.nvidia.com/developer/SDK/Individual_Samples/samples.html
ftp://download.nvidia.com/developer/SDK/Individual_Samples/DEMOS/Direct3D9/src/Q
uerySample/docs/QuerySample_userguide.pdf

The user guide and code notes that there can be significant drift in the 
timings - such that you'd want to do it every frame. Attempting to cache the 
values for any longer and you'll get major drift. Even intra-frame you could 
see drift if the OS swaps the GPU onto another app mid-frame (in between two 
command buffer issues).

Original comment by benvanik@google.com on 13 Apr 2011 at 6:00

GoogleCodeExporter commented 9 years ago
> - What should the ANGLE extension be called? (the behavior is not quite the 
same as ARB_timer_query, but close, which is why I went with ANGLE)

If there are any functional changes from the original extension, it should be 
made ANGLE (see ANGLE_framebuffer_blit and ANGLE_framebuffer_multisample  
http://code.google.com/p/angleproject/source/browse/trunk/extensions/).  If the 
changes are purely to define interactions in ES-land, we can propose 
integrating those back into the ARB extension and getting the extension listed 
as-is in the ES registry.  However functional changes definitely need to be an 
ANGLE-specific extension (or EXT if there are other vendors interested).

Original comment by dan...@transgaming.com on 13 Apr 2011 at 6:11

GoogleCodeExporter commented 9 years ago
One complicating factor is that ARB_timer_query is based on GL 3.2, which has a 
lot of the *Query calls already. None of these are present in ES, which makes 
the spec somewhat awkward. What's implemented here is really part of 
ARB_occlusion_query (has glGenQueries/etc but does not have any of the tokens 
associated with occlusion queries).
Because of that, I wasn't quite sure how to proceed.

Original comment by benvanik@google.com on 13 Apr 2011 at 6:15

GoogleCodeExporter commented 9 years ago
Right.  In that case, just make an ANGLE/ES2 version of the extension which 
brings that stuff in.   (And please draft up the extension as well -- it makes 
reviewing this much easier :-)

Original comment by dan...@transgaming.com on 13 Apr 2011 at 6:30

GoogleCodeExporter commented 9 years ago
Just submitted a related patch on an issue that impacts the usability of things 
like this on issue 143.

Original comment by benvanik@google.com on 14 Apr 2011 at 1:05

GoogleCodeExporter commented 9 years ago
Drafting an extension now. I'll hold on getting an updated patch into the 
codereview site until issue 143 is resolved.

Original comment by benvanik@google.com on 14 Apr 2011 at 6:52

GoogleCodeExporter commented 9 years ago
Draft spec of ANGLE_timer_query here: http://codereview.appspot.com/4406049/

Original comment by benvanik@google.com on 14 Apr 2011 at 8:29

GoogleCodeExporter commented 9 years ago
I've been trying to implement the glGet(GL_TIMESTAMP) call and cannot figure 
out a way to get the same behavior as ARB_timer_query on top of D3D9. It cannot 
flush the GPU pipeline, but I can't think of an efficient way to do this.

If someone is using ARB_timer_query today and calling this method assuming it 
has little impact on their performance, the method in this extension should 
also have little impact. If the behavior of the method is to change 
significantly to be a blocking flush I'd argue that it should be omitted and 
instead let the application build this behavior.
A blocking flush query of the time on the GPU can be implemented simply as:
glQueryCounter(GL_TIMESTAMP, query)
glGetQueryObjectui64v(query, GL_QUERY_RESULT, ×tamp)

Thoughts?

Original comment by benvanik@google.com on 14 Apr 2011 at 10:18

GoogleCodeExporter commented 9 years ago
Agree that implementing something that flushes is not great. 

nduca's example was to initialize a time offset, presumably just once at init 
time, then use that offset as an estimate of where the GPU time would be. That 
would be close, but not precisely what the extention specifies as per ben's 
followup. However, it may be close enough, and worth implementing.

The alternative is to remove get(timestamp) from the extention. That would 
trickle all the way up through http://crbug.com/79060 though, which is a bit 
unfortunate. But, as our immediate higher level needs would be satisfied, 
perhaps it's the right path. I'd LGTM.

Original comment by scheib@chromium.org on 15 Apr 2011 at 5:09

GoogleCodeExporter commented 9 years ago
I just implemented the behavior nduca suggested but the drift is too extreme to 
be usable. I have to resync the timer every second or so to keep it close, and 
since the sync requires a full finish + flush this isn't really workable. The 
nVidia Query Sample also notes this problem, and I can't seem to find anyone 
who has successfully got timing behavior like this working on D3D.

Original comment by benvanik@google.com on 15 Apr 2011 at 9:03

GoogleCodeExporter commented 9 years ago
My bet is that the drift is power-mgmt...

Does disabling power managment features on the gpu and cpu [wherever you can 
find them] change the drift?

Original comment by nd...@chromium.org on 15 Apr 2011 at 9:07

GoogleCodeExporter commented 9 years ago
re: GetInteger64v(TIMESTAMP) - I believe this is supposed to be a synchronous 
(but non-blocking version) way of getting the current GPU time.   Not really 
sure how to go about implementing that on D3D9 myself.  If it's a significant 
problem, I'd recommend just leaving it out of the ANGLE extension.

As pointed out above (and I recall from discussions at the ARB when we put this 
into GL 3.2), you *cannot* attempt to correlate these timestamp queries to 
wall-time.  They are only meant to give relative times, not absolute times.

Original comment by dan...@transgaming.com on 19 Apr 2011 at 5:09

GoogleCodeExporter commented 9 years ago
Does anyone know someone who may have more insight into a solution about the 
TIMESTAMP-on-D3D9 issue? Maybe some NVIDIA/ATI guys? I'm at a dead end.

New draft spec available for review here:
http://codereview.appspot.com/4406049/
It has the glGetInteger/TIMESTAMP stuff in it, but is easy to pull out if we 
are stuck.

Original comment by ben.vanik on 19 Apr 2011 at 8:49

GoogleCodeExporter commented 9 years ago
Committed the ANGLE_timer_query draft extension spec in r627. Working on 
getting a code review together for the implementation!

Original comment by benvanik@google.com on 29 Apr 2011 at 6:08

GoogleCodeExporter commented 9 years ago
Code review for first pass implementation: 
http://codereview.appspot.com/4433090/

Original comment by benvanik@google.com on 29 Apr 2011 at 8:23

GoogleCodeExporter commented 9 years ago

Original comment by c...@chromium.org on 7 Dec 2013 at 4:10

GoogleCodeExporter commented 9 years ago

Original comment by geofflang@chromium.org on 10 Dec 2013 at 3:51

GoogleCodeExporter commented 9 years ago
Note that the ANGLE_timer_query has been retired from the ES specification and 
been replaced by EXT_disjoint_timer_query. There are minor differences in the 
API, mainly the addition of querying disjointness and the renaming of a method 
from getQueryParameter to getQueryObject.

Tickets for implementation in UAs are found here.

- Chrome: https://code.google.com/p/chromium/issues/detail?id=345227
- Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=974832
- Webkit: https://bugs.webkit.org/show_bug.cgi?id=129090

EXT_disjoint_timer_query can be implemented by a UA if either any one of these 
conditions is met:

- The OpenGL 3.3 core profile is supported: 
https://www.opengl.org/registry/doc/glspec33.core.20100311.pdf
- ARB_timer_query is supported: 
https://www.opengl.org/registry/specs/ARB/timer_query.txt
- Direct3D 9 is supported: 
http://msdn.microsoft.com/en-us/library/windows/desktop/bb147308(v=vs.85).aspx
- The OpenGL ES EXT_disjoint_timer_query is supported: 
http://www.khronos.org/registry/gles/extensions/EXT/EXT_disjoint_timer_query.txt

The disjoint behavior lacking from ARB_timer_query and OpenGL 3.3 core can be 
emulated by the UA.

Original comment by pyalot@gmail.com on 28 Feb 2014 at 10:43

GoogleCodeExporter commented 9 years ago
WebGL specification of this extension is here: 
http://www.khronos.org/registry/webgl/extensions/EXT_disjoint_timer_query/

Original comment by pyalot@gmail.com on 28 Feb 2014 at 10:43

GoogleCodeExporter commented 9 years ago
We won't implement the deprecated ANGLE_timer_query, but will instead consider 
implementing EXT_disjoint_timer_query: Issue 657.

Original comment by c...@chromium.org on 21 May 2014 at 3:39