processing / processing4

Processing is a flexible software sketchbook and a language for learning how to code. Since 2001, Processing has promoted software literacy within the visual arts and visual literacy within technology. There are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning and prototyping.
https://processing.org
Other
6 stars 1 forks source link

POINTS PShape performance is drastically worse in 4.0.b3 #345

Closed processing-bot closed 2 months ago

processing-bot commented 2 years ago

Created by: cacheflowe

With the latest core.jar, a technique I've been using for GPGPU particles is now incredibly slow. I'm guessing it's something to do with changes to the way that geometry is cached in the latest update. I've gone from 60fps with lots of headroom to draw more graphics, down to 10fps. I've tested with POINTS geometry, but I'm guessing other types of geometry might be impacted.

Steps to Reproduce

Run this code in the latest version of Processing (4.0.b3), and then run it in an older version - even 3.5.4 works great. This makes a PShape with about 1 million points, which used to run great on a decent GPU. Now it is very slow.

PShape s;

void settings() {
  size(800, 800, P3D);
}

void setup() {
  s = buildPointsShape(1024, 1024);
}

PShape buildPointsShape(int w, int h) {
    int vertices = w * h;
    s = createShape();
    s.beginShape(POINTS);
    for (int i = 0; i < vertices; i++) {
      s.stroke(255);
      s.vertex(random(-256, 256), random(-256, 256), random(-256, 256));
    }
    s.endShape();
    return s;
}

void draw() {
  background(0);
  text(frameRate, 30, 30);
  translate(width/2, height/2);
  rotateY(frameCount/20.);
  shape(s);
}

Your Environment

processing-bot commented 2 years ago

Created by: cacheflowe

@codeanticode Thanks for looking at this. I've tried a ton of different settings on the video card and Windows, and I'm consistently getting 10fps on this code with 4.0.3b, but 60fps on any version beforehand. If I switch from POINTS to LINES in the code above, it goes back up to 60fps, so maybe it has something to do specifically with POINTS. If I use TRIANGLES, I get 33fps in older versions, but 22fps in 4.0.3b, so triangles seem impacted as well.

I've been testing all of the Processing examples, as well as my own apps, and most code seems fine at first glance. But...

I did find one other app that went from 60fps down to 50 when I switch core.jar to 4.0.3b. This app of mine has a noticeable slowdown - it regrettably has lots of dependencies on my little framework. In short, I'm creating around 60k cubes in a PShape group, giving each one sub-shape attributes that let me move/rotate/scale them in a vertex shader. I'm using createShape(BOX, w, h, d) to create the sub shapes. I'm not sure what the commonality is here, but I do think something has changed in core.jar that makes certain geometry slower. Though it's only noticeable when working with pretty high vertex counts on my machine.

I'll try to find more clues here - let me know if you think of anything I could try. And thank you!

processing-bot commented 2 years ago

Created by: codeanticode

Oops, sounds like a performance-impacting regression :-( I will look into it, so can be fixed in the next beta.

processing-bot commented 2 years ago

Created by: codeanticode

hmm maybe it's platform-dependent? The big change introduced in b3 was buffer object streaming in PShape (#196), which should not result in any performance difference in all normal uses of PShape, and speed things up when modifying PShape objects after creation. But maybe there is some driver issue that is affecting streaming on windows? I'd need to run more tests, and I wonder if you have access to other systems to check if the issue happens on all of them.

processing-bot commented 2 years ago

Created by: codeanticode

@cacheflowe Cannot reproduce the issue using the code you posted, it runs at 60 fps on my computer. Do you have any other example showing a slowdown in PShape?

processing-bot commented 2 years ago

Created by: cacheflowe

@codeanticode I just pulled out an older Alienware mini PC with a GTX 860M GPU, and tried this code with 4.0.2b, 4.0.3b, and 4.0.6b. I tested the same code on my newer RTX 3080 on those same Processing versions. Unfortunately, there is consistency with my NVIDIA GPUs:

I'm happy to test this more, but these are the only machines I have easy access to right now.

processing-bot commented 2 years ago

Created by: cacheflowe

I've been trying to find a way to do GPU profiling but haven't found a good way to do that yet. I've also changed every NVIDIA setting but nothing has helped. 4.0.4b still exhibits the same slowdown for me.

processing-bot commented 2 years ago

Created by: codeanticode

Really strange, I run some tests on a windows laptop and saw no difference pre and after b3, although it was not an nvidia-based computer. I haven't seen anybody else reporting PShape slowdowns so I'm still guessing is a system-specific issue. Have you a the chance to try on a different computer?

processing-bot commented 2 years ago

Created by: codeanticode

@cacheflowe do you have any other info on this issue? were you able to run the latest beta on another computer so see if the slowdown still happens?

processing-bot commented 2 years ago

Created by: codeanticode

Ok thanks for the additional testing, I will look into this further.

processing-bot commented 2 years ago

Created by: benfry

Woohoo! Closing.

processing-bot commented 2 years ago

Created by: codeanticode

@benfry I think so but let's wait for confirmation from @cacheflowe. Looks like this issue is affected by the implementation of the gl drivers on each platform.

@cacheflowe Let us know if performance is restored in beta7. There are a couple of internal parameters you can play with in case is still lower, and would be help us understand what might still be causing trouble.

For instance, if you do:

void setup() {
  PGL.USE_BUFFER_OBJECT_STREAMING_IN_RETAINED_MODE = false;
  s = buildPointsShape(1800, 1700);
}

all the buffer object streaming optimizations introduced in beta3 are disabled, so you should get exactly the same performance as before. You can also play with the buffer access parameter (when buffer object streaming is enabled), for example:

void setup() {
  PGL.glBufferAccess = PGL.READ_WRITE;
  s = buildPointsShape(1800, 1700);
}

PGL.READ_WRITE is the new default, which should solve the performance issue, but you can also try PGL.WRITE_ONLY (the previous default) and PGL.READ_ONLY.

@benfry I will create a new PR so the naming of these debug parameters is more consistent.

processing-bot commented 2 years ago

Created by: cacheflowe

Performance is restored in beta 7!

A few notes:

Anyway, THANK YOU for the solutions 🙇

processing-bot commented 2 years ago

Created by: codeanticode

@cacheflowe glad it works, and thanks for testing the different combinations of buffer parameters. Right now, seems like streaming enabled and write-only is the way to go in most situations. Please notice that in beta8 those PGL parameters will be renamed according to the changes in this PR.

processing-bot commented 2 years ago

Created by: benfry

@codeanticode Safe to close after #432?

processing-bot commented 2 years ago

Created by: github-actions[bot]

This issue has been automatically locked. To avoid confusion with reports that have already been resolved, closed issues are automatically locked 30 days after the last comment. Please open a new issue for related bugs.