Why are algorithms written in pseudo-code?

kvark commented 3 years ago

In WebGPU there is a lot of different algorithms (e.g. alidating-gpusamplerdescriptor), mostly for validating the state. They are written in pseudo-code, using english words mixed with math equations. Looking at other specifications (WebBluetooth, WebAudio), it appears to be common.

I'm concerned that there is nothing to validate that the spec is right, since it's written in a non-formal language. The CTS may interpret it one way, implementations may interpret it another way, and finally the users may treat it entirely differently. So the spec text becomes this volume of logic that has to be validated manually by humans (spec editors). It surely secures a good number of jobs, but I think we should prioritize the specs being correct (at least at some level) by construction.

That is, would there be any reason why all the algorithms used by a specification couldn't be written in an actual programming language? Even JavaScript would do. These formal algorithms would basically be pieces of a reference implementation. It doesn't have to be complete, but the bits of it can still be checked by a compiler/interpreter, tested by the CTS, and even ran by the user on their data.

If we have the algorithms formally described, the specification text could focus more on establishing the proper mental models, building intuition, and generally be more approachable.

P.S. I wrote more about the issue in the context of validation on my blog

marcoscaceres commented 3 years ago

I'm concerned that there is nothing to validate that the spec is right, since it's written in a non-formal language.

This would be covered by #13 - the web platform tests serve the purpose of the accompanying formal language.

The CTS may interpret it one way, implementations may interpret it another way, and finally the users may treat it entirely differently.

Web platform tests are the arbiter of truth in that situation.

That is, would there be any reason why all the algorithms used by a specification couldn't be written in an actual programming language? Even JavaScript would do.

That wouldn't make much sense, as some implementations are written in C++, others in Rust, others in JS, etc.

This will be covered in WebIDL #9 and Infra #18 - but we could have a session specifically around "implementing an API or spec in a browser" - (filed #19).

It doesn't have to be complete, but the bits of it can still be checked by a compiler/interpreter, tested by the CTS, and even ran by the user on their data.

WebIDL plays this role, somewhat... but not in the algorithms. Additionally, the algorithms can't be overly prescriptive, as they can be subject to optimizations within each browser.

If we have the algorithms formally described, the specification text could focus more on establishing the proper mental models, building intuition, and generally be more approachable.

The downside would be spec editors having to learn a formal language + as well as anyone else reading the spec. That makes it challenging to communicate things in the spec widely. Using English is not great... but it's not terrible either... specially where things can be backed by web platform tests.

ericprud commented 3 years ago

How about a convention for alternative languages? That way, anyone who does know the implementation language can human-verify the pseudocode. You probably don't want to see the cobol implementation of elliptic curves but some sort of tab convention could make folks happy. https://www.w3.org/TR/owl2-primer/#OWL_Syntaxes does something like that, which many people have appreciated.

kvark commented 3 years ago

Thank you for your feedback! @marcoscaceres

the web platform tests serve the purpose of the accompanying formal language. Web platform tests are the arbiter of truth in that situation.

I don't think the tests can prove the correctness of an algorithm in the spec. By definition, tests cover a few concrete paths through this algorithm. Tests can't guarantee that they cover all the code paths.

Another problem tests don't solve is answering a question "what happens if my inputs are A, B, C?". One could try to step through the algorithm in their head, interpreting the spec in some way, and even try running the cases through a few implementations, but neither is going to provide an authoritative answer to what should happen. If there is a reference implementation, they can just run it and see.

That wouldn't make much sense, as some implementations are written in C++, others in Rust, others in JS, etc.

It doesn't matter what the other (in-browser and whatnot) implementations are written in. This is about a reference implementation, and JS is the language of the Web today, so it seems natural that a reference implementation could use it and assume familiarity with it of the reader.

Additionally, the algorithms can't be overly prescriptive, as they can be subject to optimizations within each browser.

The point of a reference implementation is not to show optimizations. It is to define the behavior formally, so it is prescriptive by definition.

The downside would be spec editors having to learn a formal language + as well as anyone else reading the spec.

Can W3C assume familiarity with JS or TS? I don't know either personally, and I'd be willing to cover that gap for the purpose of formalizing the specification in it.

@ericprud A convention would be a good incremental step in the right direction!

marcoscaceres commented 3 years ago

Ah, I see where you are going with this. Yes, it's definitely not uncommon to write reference implications in JS (or ts). I do this regularly to verify the algorithms in my own specs. For example, payment request api:

https://github.com/marcoscaceres/web-payments-proto

However, we generally don't maintain them and are not part of the "official" spec. But definitely extremely useful. There are limitations, however. Specially around security things (e.g., if a trusted click event is needed or it's dependent on other unexposed browser internals).

marcoscaceres commented 3 years ago

So, I'm thinking maybe a tutorial on prototyping to verify specs?

Update: Filed #20

ljharb commented 3 years ago

For ecma262, I've written https://npmjs.com/es-abstract which I then use to implement all of my shims/polyfills. It's helped catch spec bugs many times by virtue of the polyfill being written more-or-less in spec language. I got the idea from https://www.npmjs.com/package/especially, which helped me implement a Promise polyfill.

It's likely such an approach would work well for HTML specs as well.

marcoscaceres commented 3 years ago

@ljharb that's awesome! Thanks for letting me know about those.

speced / speced-cg

Why are algorithms written in pseudo-code? #17