Closed jzm-intel closed 1 year ago
I thought about suggesting that when I first tried \0
in strings. What I don't like about it, at least for label
is it turns label
into a slower operation as every label set would require scanning the label for \0
. Given that what happens on the backend with a label is implementation defined, it seems fine to allow \0
, just with maybe a note to implementations to be aware of the issue.
For code
it's already a rule in WGSL so that one seems covered.
This was discussed a bit previously in 2021 in the context of WGSL code: https://github.com/gpuweb/gpuweb/issues/1816
The comment on that issues says that after discussion "there is no particular desire to disallow null characters for the sake of allowing implementations to use null-terminated strings." but the corresponding meeting minutes are pretty brief and don't dive too much into what drove that conclusion.
Discussed a bit with @toji. We were curious, @jzm-intel, was there a particular issue that caused you to propose this?
Allowing \0
in strings is well established throughout the web specs and the web engine codebases (Blink/WebKit/Gecko), so the default answer is certainly to allow them. For anything that doesn't pass into native implementations (Dawn/wgpu/etc) we should have no trouble implementing null characters. That includes at least error constructors, and the wrapper-local state for label
.
I also agree with Gregg that since labels are all used in implementation-defined ways, label
, groupLabel
, and markerLabel
don't really need validation. We can just normalize or trim them internally as we need.
For the rest, I think it's most congruous with the rest of the platform to simply handle \0
s naturally:
code
, hints
, entryPoint
, constants
) will necessarily fail anyway since \0
isn't valid in WGSL.requiredLimits
already rejects with an OperationError
if you specify an unrecognized limit name.unmaskHints
, afaik, simply ignores any unrecognized keys, so even if we're translating them down to the native layer, we can just skip (and optionally warn) on ones with \0
.In summary, I don't think we should change anything, unless we have gaps in the current spec that need fixing.
was there a particular issue that caused you to propose this?
Yes in particular I try using \0
in the entryPoint
string, which is indeed pass into native implementations, and add this case in the CTS (e.g. using entryPoint
name main\0a
in pipeline desc while in program the fn name is main
). Following the current spec, it should be a validation error. But in current Chrome implementation, the well-defined UTF-8 string are somehow passed to native implementations (Dawn in this case) as null-terminated string without proper validation, making the CTS failed. So I propose this to see if we could have a unified path to get rid of \0
everywhere.
I agree that we can handle each use case under the current spec.
Discussed in today's meeting. If we understood correctly there are no major remaining concerns with implementing the specified semantics, so we think no API change is needed.
WGSL code and identifiers (code, hints, entryPoint, constants) will necessarily fail anyway since \0 isn't valid in WGSL.
Both Firefox and Chrome currently seem to incorrectly match an entryPoint with \0
in it to a shader that does not. In other words
// shader
...
@vertex
fn myVSMain(v: MyVSInput) -> MyVSOutput {
...
// JavaScript
const pipeline = device.createRenderPipeline({
vertex: {
entryPoint: 'myVSMain\0foo',
This code incorrectly? matches entry point myVSMain
even though the entry point specified was myVSMain\0foo
Should that be an error like 'no entry point named myVSMain\0foo
' or is the fact that it matched acceptable?
Should that be an error like 'no entry point named myVSMain\0foo' or is the fact that it matched acceptable?
According to current spec (treated as USVString
, which is not null-terminated), and with my common sense, I thought this should be a error.
Yes these are bugs in Chromium and Firefox. We have an open issue for it in Dawn.
I would like to reopen this because is a breaking change for all any apps using the C API (so for example the python integration, emscripten, .net, go, etc...) and, AFIACT it's non-trival to implement in Dawn/Tint (I could be wrong about this)
The biggest issue is \0
is not allowed, so any work required is work all so we can just say "No! You did the wrong thing". We arguably have better things to do than subject so many projects to a breaking change and ourselves to work all so we can just tell people from the C API level that passing \0
in a string is bad.
AFAICT there are 4 issues
For (1) labels it's irrelevant. The spec says what happens to labels is undefined. So if the user passed foo\0bar
the implementation is allowed to print whatever if wants foo
, no-label-for-you
, nothing, 12345
whatever.
For (2) WGSL disallows \0
For (3) It's not clear if \0
is allowed but I tried passing \0
in a comment to Tint and it failed so engineering work would be required here
For (4) because \0
is disallowed in WGSL then passing \0
here is always an error
Because of all these points, changing the C API to take sizes just seems like a huge waste of everyone's time. Not just webgpu implementors but also existing webgpu native users. Across all projects a few 100 man hours will be spent all so we can add a code which doesn't actually add any features, just prints an error.
Two alternative solutions I'd like to suggest
Solution 1: for WGSL and entryPoint, throw if \0
is passed in.
I know that WebGPU generally returns errors asynchronously but there are several it doesn't so for example
device.createShaderModule(); // throws, expected 1 argument got 0
device.createShaderModule(""); // throws, type not GPUShaderModuleDescriptor
device.createShaderModule({}); // throws, type not GPUShaderModuleDescriptor
would it really be so bad if the API was defined as not accepting \0
for code and entryPoint
and it threw here? No user is going to intentionally add \0
to their code because as pointed out a above it's an error.
Solution 2: keep the C API the same, workaround it in the browser
The C API already doesn't allow \0
since it only accepts \0
terminated strings.
So, the JavaScript/Browser side and scan the string and effectively synthesize an
asynchronous WebGPU error. It would do this
createShaderModule(descriptor) {
if (descriptor.code.includes('\0')) {
device.pushErrorScope();
const invalidShaderModule = device.createShaderModuleImpl(descriptorThatFails);
device.popErrorScope().then(() => {
insertSyntheticError("`\0` not allowed in WGSL");
});
return invalidShaderModule;
} else {
return device.createShaderModuleImpl(descriptor);
}
You can do something similar for the create pipline functions.
I'd strongly prefer solution (# 1). It means the least amount of wasting everyone's time with a change that adds nothing. (# 2) is a compromise. It doesn't waste other's time but it still wastes ours.
Looking forward ot hear Kai's ideas
I feel based on the comments from the meeting my current proposals is:
Leave the spec as is, leave webgpu.h as is, and, deal with
\0
in WGSL andentryPoint
in the browser, synthesizing a normal WebGPU type error.
For (3) It's not clear if
\0
is allowed but I tried passing\0
in a comment to Tint and it failed so engineering work would be required here
WGSL doesn't allow null anywhere in the program source, including comments AFAIU: https://gpuweb.github.io/gpuweb/wgsl/#textual-structure
I don't recall where, but at some point I think someone noted that identifiers have lots of other invalid characters too. We could easily for example replace NUL
characters at the JS API with \
0
or �
or whatever at the C API, it wouldn't affect whether there's an error, and the error message would still be clear to human readers. No reverse translation (C->JS) is needed.
This doesn't do anything to solve NULs in GPUShaderModuleDescriptor.code, but it at least constrains the problem to just one place, and fixes most of them (the various forms of GPUProgrammableStage.entryPoint/constants, and GPUShaderModuleDescriptor.hints).
That is to say - I think we can keep the C API down to just one string length "wart" without changing the JS API.
Changing the C API to take a string length is strictly worse. We'd be taking an API in which it's impossible to pass \0
(an error) and making it possible to pass the error condition for absolutely no other gain. An API where you can pass less wrong things is better than one in which you can pass more wrong things.
I think we should leave the C API as is, browsers can work around it as mentioned in https://github.com/gpuweb/gpuweb/issues/3576#issuecomment-1406926447
That's fine with me, pretty sure it [EDIT: solution 2] is essentially the same as the group's previous resolution.
Wait, sorry, do you mean solution 1 or solution 2? Both keep the C API as is.
synthesizing a normal WebGPU type error
WebGPU validation error, or TypeError exception?
I mean, keep the C API as is, synthesize a validation error (yes, solution 2)
the same as the group's previous resolution.
That is to say - the CG's resolution for the JS API; we didn't officially make a decision on the C API. I think that's what we initially had in mind, but then it got further litigated in the webgpu-headers repo (https://github.com/webgpu-native/webgpu-headers/issues/170). So since you're proposing no JS API change, we'll be able to close this and move the discussion back there.
Tacit resolution queue to close this again, since it matches the group's previous resolution, and (I think) the leanings in the last meeting.
(Edited: bold the proposal)
Proposal
Currently WebGPU Spec use
USVString
andDOMSrting
in several places (listed below) including user input.For both type a string with a null character
'\0'
(i.e. Unicode code point U+0000) in it is valid, e.g."abc\0xyz"
would be validDOMString
andUSVString
, however such kind of strings is kind of tricky in other languages like C/C++.Currently the WGSL spec explicitly forbid the null code point (U+0000) in WGSL program. Although using a string with
'\0'
in it in some case likeentryPoint
should be a validation error (since there must not be a WGSL entry point name containing'\0'
), it may be generally better and easier to forbid any'\0'
in all string user providing to WebGPU. It would be coherent to report a JavaScript TypeError whenever any user-provided string was found to contain a'\0'
.I am not quite sure how to introduce such rule into WebGPU spec, considering
USVString
andDomString
are used multiple times in different places within the spec. Maybe add a small sub-chapter under Chapter 3. Fundamentals?Current usage
Currently
USVString
andDomString
are used in WebGPU spec for:label
inGPUObjectBase
code
andhints
inGPUShaderModuleDescriptor
entryPoint
andconstants
inGPUProgrammableStage
groupLabel
forpushDebugGroup
andmarkerLabel
forinsertDebugMarker
inGPUDebugCommandsMixin
unmaskHints
of requestAdapterInforequiredLimits
inGPUDeviceDescriptor
message
of constructors for different GPUError subtypes, e.g. GPUValidationErrortype
of constructors forGPUUncapturedErrorEvent