project-everest / everparse

Automated generation of provably secure, zero-copy parsers from format specifications
https://project-everest.github.io/everparse
Apache License 2.0
251 stars 15 forks source link

Add high-level serializer for 3D types #127

Open gebner opened 10 months ago

gebner commented 10 months ago

This adds an additional as_serializer : t:typ ... -> serializer (as_parser t) function to the 3D interpreter.

Some notable complications:

  1. In order to have serializers, we need to change some of the high-level types. For example, the type for parse_nlist cannot be the type of all lists of elements. Instead the type needs to be refined to lists whose serialization is sz bytes long. (See FLData.parse_fldata vs. FLData.parse_fldata_strong.)
  2. This means that the as_type function now depends on as_serializer. Naturally, we would implement this is a mutual recursion on typ but here we can't do this because of type dependencies. AFAICT F* does not support mutual recursion for functions whose types depend on the other functions' values, so I manually packed it into a dependent pair.
    • as_type : typ ... -> Type
    • as_parser : t:typ ... -> parser ... (as_type t)
    • as_serializer : t:typ ... -> serializer (as_parser t)
  3. I had to restrict the parser kind of the various list element parsers to WeakKindStrongPrefix to satisfy the preconditions of the serializer. This breaks some creative applications which specify a byte-length for a variable-length type by treating it as an array with a specified byte-size (with at most one element), like in TestAllBytes.3d:
    
    typedef struct _test1 {
    UINT8 remainder[:consume-all];
    } test1;

typedef struct _test3 { UINT32 size1; test1 mytest1_array[:byte-size size1]; // in practice, this array will only have one element (or zero, if size1 == 0); } test3;


 4. The `[@@erasable]` annotation does not seem to work for `serializer`.  I have no idea why, it works just fine for the analogously defined `parser` type.
gebner commented 10 months ago

One more complication: this change by itself increases the verification time of the 3D-generated F* code generated by a lot. As just one data point, verification time of ELF.fst soars from 1.3s to 67s on my machine.

Extracting an interface for EverParse3d.Interpreter reduces the runtime to 1.25s.

gebner commented 10 months ago

I think we've come to the conclusion that we don't want to do this approach after all. It significantly increases the verification and extraction runtime with the tactic-based normalizer. We're going to try to specify the low-level serializer in terms of the high-level parser, i.e., instead of saying that bytes = serializer data we're going with parse bytes = Some (data, ...) (with appropriate side conditions).

I had to restrict the parser kind of the various list element parsers to WeakKindStrongPrefix to satisfy the preconditions of the serializer. This breaks some creative applications which specify a byte-length for a variable-length type by treating it as an array with a specified byte-size (with at most one element), like in TestAllBytes.3d:

typedef struct _test1 {
  UINT8 remainder[:consume-all];
} test1;

typedef struct _test3 {
  UINT32 size1;
  test1 mytest1_array[:byte-size size1];
  // in practice, this array will only have one element (or zero, if size1 == 0); 
} test3;

@tahina-pro Do you have any ideas on how to tackle the issue above? Should we add a new combinator that forces a variable-length type to consume exactly n bytes?