sciapp / gr

GR framework: a graphics library for visualisation applications
Other
328 stars 55 forks source link

Performance Consistency Issues #180

Closed EmDash00 closed 4 weeks ago

EmDash00 commented 1 year ago

I've been trying to use GR as an interactive visualization tool for a game experimental subjects will play; however, framerate consistency does appear to be an issue. I'm using Python GR; however, it appears that this is an issue with GR itself. Is there a way to make them more consistent?

Below is an example of the time it takes to call gr.updatews() with the double pendulum example. As you can see every few calls, we can see the amount of time it takes to call gr.updatews() spikes. I'm wondering what exactly causes this and if anything can be done about it.

Timings Pendulum

Interestingly if we reduce the frequency with which we call gr.updatews() this framerate consistency vanishes.

Timings Pendulum2

Here's the code I'm using for profiling:


import numpy as np
import time
import gr

try:
    from time import perf_counter
except ImportError:
    from time import clock as perf_counter

g = 9.8        # gravitational constant

timings = open('timings_pendulum.csv', 'w')

def rk4(x, h, y, f):
    k1 = h * f(x, y)
    k2 = h * f(x + 0.5 * h, y + 0.5 * k1)
    k3 = h * f(x + 0.5 * h, y + 0.5 * k2)
    k4 = h * f(x + h, y + k3)
    return x + h, y + (k1 + 2 * (k2 + k3) + k4) / 6.0

def pendulum_derivs(t, state):
    # The following derivation is from:
    # http://scienceworld.wolfram.com/physics/DoublePendulum.html
    t1, w1, t2, w2 = state
    a = (m1 + m2) * l1
    b = m2 * l2 * np.cos(t1 - t2)
    c = m2 * l1 * np.cos(t1 - t2)
    d = m2 * l2
    e = -m2 * l2 * w2**2 * np.sin(t1 - t2) - g * (m1 + m2) * np.sin(t1)
    f =  m2 * l1 * w1**2 * np.sin(t1 - t2) - m2 * g * np.sin(t2)
    return np.array([w1, (e*d-b*f) / (a*d-c*b), w2, (a*f-c*e) / (a*d-c*b)])

def pendulum(theta, length, mass):
    l = length[0] + length[1]
    gr.clearws()
    gr.setviewport(0, 1, 0, 1)
    gr.setwindow(-l, l, -l, l)
    gr.setmarkertype(gr.MARKERTYPE_SOLID_CIRCLE)
    gr.setmarkercolorind(86)
    pivot = [0, 0.775]                         # draw pivot point
    gr.fillarea([-0.2, 0.2, 0.2, -0.2], [0.75, 0.75, 0.8, 0.8])
    for i in range(2):
        x = [pivot[0], pivot[0] + np.sin(theta[i]) * length[i]]
        y = [pivot[1], pivot[1] - np.cos(theta[i]) * length[i]]
        gr.polyline(x, y)                   # draw rod
        gr.setmarkersize(3 * mass[i])
        gr.polymarker([x[1]], [y[1]])       # draw bob
        pivot = [x[1], y[1]]

    t0  = perf_counter()
    gr.updatews()
    timings.write(f"{perf_counter() - t0}\n")
    return

l1 = 1.2       # length of rods
l2 = 1.0
m1 = 1.0       # weights of bobs
m2 = 1.5
t1 = 100.0     # inintial angles
t2 = -20.0

w1 = 0.0
w2 = 0.0
t = 0
dt = 0.08
state = np.radians([t1, w1, t2, w2])

now = perf_counter()

while t < 10:
    start = now

    t, state = rk4(t, dt, state, pendulum_derivs)
    t1, w1, t2, w2 = state
    pendulum([t1, t2], [l1, l2], [m1, m2])

    now = perf_counter()
    if start + dt > now:
        time.sleep(start + dt - now)```
jheinen commented 1 year ago

Thanks for the hint. Which output device did you use?

If you're on macOS, pls try GKS_WSTYPE=gksqt python3 ...

EmDash00 commented 1 year ago

I'm on Ubuntu 20.04.6 Linux. Here's some system info:

❯ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Stepping:                        10
CPU MHz:                         799.968
CPU max MHz:                     4000.0000
CPU min MHz:                     400.0000
BogoMIPS:                        3999.93
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        8 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:          Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Mitigation; Microcode
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse s
                                 se2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtop
                                 ology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16
                                 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm a
                                 bm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority e
                                 pt vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt inte
                                 l_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_cl
                                 ear flush_l1d arch_capabilities
❯ sudo lshw -C display
  *-display
       description: VGA compatible controller
       product: UHD Graphics 620
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       logical name: /dev/fb0
       version: 07
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom fb
       configuration: depth=32 driver=i915 latency=0 mode=1920x1080 visual=truecolor xres=1920 yres=1080
       resources: iomemory:2f0-2ef iomemory:2f0-2ef irq:145 memory:2ffa000000-2ffaffffff memory:2fa0000000-2fafffffff ioport:3000(size=64) memory:c0000-dffff

I was using the default output device. Since I'm using python I simply modified the environment variable using os.environ["GKS_WSTYPE"] = "gksqt"

This appears to improve performance; however, I am still getting spikes. Here's a histogram of the time spent in gr.updatews()

Here's some more profiling and investigation. All the following plots run it for 60 seconds using dt = 0.04

Before:

Mean: 0.0078156736655206, std: 0.013422327717545259

timings_pendlum_nogksqt

After:

Mean: 0.006298769808211356, std: 0.01115415612618402

timings_pendlum_gksqt

From the histograms we can see there's a fairly good improvement in framerate consistency. Would love it if we can get a stable 60 FPS. My computer uses integrated graphics as you can see and is a 5 year old laptop at this point.


We'll eventually be running this on a much more powerful desktop and was curious if I'm being limited by hardware. Decided to check running this on the desktop and got much more consistent timing.

dt = 0.01

Mean: 0.00024122460769211617, std: 0.0013291644913593795

timings_pendlum_beefy

The desktop has an Intel Core i7-7700 and Nvidia GTX 1070. It's possible that the integrated graphics are the bottleneck. Though the computer frequently renders much more complex scenes just fine so it's possible it's some implementation bottleneck is keeping it from performing as well as it usually does in other tasks. I didn't think it should experience throttling rendering a few lines and ### circles.

Would be curious as to what your thoughts are.

EmDash00 commented 1 year ago

@jheinen Any updates?