hashgraph / pbj

A performance optimized Google Protocol Buffers code generator, parser, and Gradle module.
Apache License 2.0
13 stars 6 forks source link

Extended Validation in Schemas #209

Open jasperpotts opened 6 months ago

jasperpotts commented 6 months ago

Problem

We would like the ability to provide extended validation rules for fields in the schema and then validate them at parse time in PBJ. The primary reason for this is security to limit the effects of badly written binary protobuf to cause the reader to spend more resources than absolutely necessary. The secondary reason is improved rigor in schemas so they are more precise and there is less scope for mistakes. The rules covered should include:

The rules we choose should be very fast to check at parse time and not have a noticeable performance impact on parsing.

There is also the aim that this can be used for tightening the specification documentation for protobuf APIs. So ideally rules documentation will be supported for explaining these rules and what they exist etc.

Solution

We could define our own or choose to be compatible with an existing option. For example "protovalidate" library from Buf. https://github.com/bufbuild/protovalidate The full options are way too much with full Google Common Expression Language support but maybe the Standard Constraints subset is applicable. https://github.com/bufbuild/protovalidate/blob/main/docs/standard-constraints.md

There is also an older version called "protoc-gen-validate (PGV)" https://github.com/bufbuild/protoc-gen-validate that seems more widely used and might be a better fit.

History blog on V1 and V2 https://buf.build/blog/protoc-gen-validate-v1-and-v2

Alternatives

No response

david-bakin-sl commented 6 months ago

We have, in the HAPI protobufs, a TokenTransferList message which has two repeated fields - one for fungible transfers and one for non-fungible transfers. And there's a rule: "Each TokenTransferList can specify up to 10 adjustments." But that would be a sum over the size of the two repeated fields. Maybe that could be a supported constraint. (Perhaps an expression, expressed somehow, as a constraint.)