yeslogic / fathom

🚧 (Alpha stage software) A declarative data definition language for formally specifying binary data formats. 🚧

Apache License 2.0

258 stars 14 forks source link

Shrink surface and core AST types by replacing tuples of references with references to tuples #461

Closed Kmeakin closed 1 year ago

Kmeakin commented 1 year ago

By replacing tuples of references with references to tuples (eg (&Term, &Term) becomes &(Term, Term) we can further reduce the size of AST types. I think this should also reduce allocation overhead by batching smaller allocations together into larger ones.

Size changes:

surface::Term<ByteRange>: 48 bytes to 40 bytes
surface::Term<FileRange>: 56 bytes to 40 bytes
core::Term: 56 bytes to 48 bytes

brendanzab commented 1 year ago

I could be wrong, but I‘m not really sure how much optimisation in this direction we should be doing at this stage, especially without performance and memory measurements? For AST node size, I think it’s better to look at how we track spans in ArcValues? The main thing we need to worry about is the memory usage when parsing real world font files with opentype.fathom. I think @wezm had tried this with Arial in the past and measured a number of Gb of memory usage?

Kmeakin commented 1 year ago

I could be wrong, but I‘m not really sure how much optimisation in this direction we should be doing at this stage, especially without performance and memory measurements? For AST node size, I think it’s better to look at how we track spans in ArcValues? The main thing we need to worry about is the memory usage when parsing real world font files with opentype.fathom. I think @wezm had tried this with Arial in the past and measured a number of Gb of memory usage?

Do we even need to carry any location information around with Value? Could we perform binary semantics on core::Term (similar to how we do evaluation on core::Term)?

Kmeakin commented 1 year ago

We could use interning of values (eg https://docs.rs/arc-interner/latest/arc_interner/) to reduce memory allocations, but at the cost of slower evaluation.

Kmeakin commented 1 year ago

Also, performing some optimization passes over core::Term (eg partial evaluation, constant folding) should remove intermediate evaluation steps (and therefore intermediate Values allocated on the heap)

brendanzab commented 1 year ago

I'm thinking let’s hold off on this until we manage to reduce memory usage in other places, sorry! Thanks though for the food for thought!

Kmeakin commented 1 year ago

Sure. I'm working on a replacement for the pretty printer which should be faster and use less memory. From my experiments with heaptrack, pretty printing is responsible for the majority of peak memory usage