mrdoob / three.js

JavaScript 3D Library.
https://threejs.org/
MIT License
102.79k stars 35.38k forks source link

`WebGPUAttributes._createBuffer()` is extremely slow #25295

Open LeviPesin opened 1 year ago

LeviPesin commented 1 year ago

Description

WebGPUAttributes._createBuffer() is extremely slow because of line WebGPUAttributes.js:128 (copying an array) -- according to profile in https://github.com/mrdoob/three.js/pull/25273#issuecomment-1384921763 it takes about 70% of WebGPURenderer.compute() total CPU time (150 ms out of 210 ms in that case). (Tested on Chrome on Windows 10 but I don't expect that copying an array would be much faster on other browsers.) Not sure if something could be done to mitigate that...

Reproduction steps

  1. Use WebGPURenderer.compute()
  2. Open Chrome performance profiler
  3. See that WebGPUAttributes._createBuffer() takes very much time

Code

N/A

Live example

N/A

Screenshots

No response

Version

r149-dev

Device

Desktop

Browser

Chrome

OS

Windows

Mugen87 commented 1 year ago

You are talking about this line, right?

https://github.com/mrdoob/three.js/blob/c57a18f60dfb0fb1b398f230ba42e8d196dcb60f/examples/jsm/renderers/webgpu/WebGPUAttributes.js#L128

Do you mind finding out what part of this line is slow? Is it buffer.getMappedRange()?

This is potentially something that should be reported to the WebGPU implementers.

LeviPesin commented 1 year ago

The slow part is .set( array ) -- the array that is used in that example consists of 4*1024^2 elements and copying it is so slow. That's why I'm not sure whether it can be fixed at all... Maybe we can at least skip the array copy if the array is uninitialized and zero-filled (as it is in many cases)?

Mugen87 commented 1 year ago

as it is in many cases

I think this is an edge case since normally the buffer data (position, normals etc.) are already defined when the geometry is used for rendering the first time.

Mugen87 commented 1 year ago

Can you please verify if the performance gets better with this approach:

_createBuffer( attribute, usage ) {

    const array = attribute.array;
    const size = array.byteLength + ( ( 4 - ( array.byteLength % 4 ) ) % 4 ); // ensure 4 byte alignment, see #20441

    const buffer = this.device.createBuffer( {
        size,
        usage: usage | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST
    } );

    this._writeBuffer( buffer, attribute );

    attribute.onUploadCallback();

    return {
        version: attribute.version,
        buffer,
        readBuffer: null,
        usage
    };

}
LeviPesin commented 1 year ago

No, I'm afraid it only gets worse -- CPU time taken increases to about 85% of total CPU time in WebGPURenderer.compute(), and the real time (measured by console.time/console.timeEnd) increases from 50-100 ms to 300-600 ms... By the way, it seems that such large times (for both approaches) happen only about one third of tries -- mostly times are about 0.1 ms... This is really strange, because there does not seem to be a cache?

Mugen87 commented 1 year ago

I don't know what the browser internally does. But maybe the WebGPU Matrix chatroom or the Chromium bug tracker would be good places for finding this out.

LeviPesin commented 1 year ago

@Mugen87 -- I don't really think this is a browser issue... I will check in Firefox both approaches, but I don't think copying array would be significantly faster there (but maybe the second approach will).

Mugen87 commented 1 year ago

I've marked this as a browser issue since I don't think we can solve it here.

Mugen87 commented 1 year ago

Thought about this issue again and I think we should update the code in WebGPUAttributes to the version of https://github.com/mrdoob/three.js/issues/25295#issuecomment-1386776337.

It seems using getMappedRange() does only make sense if you don't have an array buffer in your app yet. Meaning you request one, create a typed array with it and then write your data into it.

In our case however data are already defined in buffer attributes. So it should be more performant to not create a new array and perform a copy but just use GPUQueue.writeBuffer(). The slow .set( array ) part is then gone.

LeviPesin commented 1 year ago

When I last tested the approach in https://github.com/mrdoob/three.js/issues/25295#issuecomment-1386776337 it was much slower than the current code.

lega0208 commented 1 year ago

I'm looking into playing around with WGPU and was wondering what the performance implications of this issue are. From what I can tell, it looks like the slowdown would be triggered whenever any attributes or their usage mode is updated, but there would be no slowdown if the attributes are initialized with data and not changed afterwards. Is that an accurate assessment?

LeviPesin commented 1 year ago

I think that creation is very slow (this is what the issue is about), but update maybe is even slower (because the second approach in https://github.com/mrdoob/three.js/issues/25295#issuecomment-1386776337 appeared to be slower than the current).

lega0208 commented 1 year ago

Sorry, my question was a bit misleading. I meant more in the context of WebGPU in Three.js as a whole. Digging around a bit more, it seems like this would affect almost everything involving attributes (i.e. almost everything)? I'll probably hold off a bit until the ecosystem settles 😅

I'm not super well versed in Three.js internals, but this does make me wonder: If copying the array is what's actually taking most of the time, is this something that doesn't have to be done for WebGL? Otherwise it doesn't seem like there should be a difference. On the other hand, if the version that doesn't copy the array is actually slower, that's a bit of a conundrum.

Maybe one line of investigation could be creating benchmarks for WebGPU vs. WebGL for this specific context and comparing step by step to see where the differences appear? Not sure if the two are too different for this to be possible, but could be worth a shot.

Either way, thank you for your contributions 😊