Open rljacobson opened 1 year ago
Fathom is still at an early stage, so in terms of practical application Kaitai can do more things right now. However what we are aiming for is to see if basing a data description language on dependent type theory gives us the power to handle structures that Kaitai cannot describe, such as the interdependencies between tables in OpenType, without being overwhelmed with complexity.
That answer is much more interesting than I could have hoped for! Super cool. I'm staring so I can check back in. I see the repo is ~3 years old. Is it a personal research project with high ambitions, or a "slow and steady wins the race" situation? How has progress been?
There has been a lot to do! To begin we needed to understand the fundamentals of how to implement a dependent type system with bidirectional type checking and normalisation by evaluation, and explore the different options for name binding and how best to implement them in Rust, along with some detours like providing really nice error messages :sweat_smile:
Our focus for the project has been supporting the OpenType font format since we use this in Prince (our PDF formatter product) via Allsorts (our Rust font library) and it's a large and complex format that could benefit from better tools. (The code to process fonts is time consuming to write, error prone, and often a target for security exploits). The Fathom description of OpenType is not fully complete, but includes a significant slice of the specification:
https://github.com/yeslogic/fathom/blob/main/formats/opentype.fathom
In particular it can handle some of the tricky aspects like the way the hmtx
table can only be parsed after reading values taken from the hhea
and maxp
tables, and some other table interdependencies that I believe may go beyond what Kaitai can do without awkward workarounds. It can also express the way that the GSUB
and GPOS
layout tables are both trees with the same branching structure at the upper levels but different leaves, using parameterised definitions.
As Fathom has grown in complexity it has been convenient to experiment with new features in separate prototypes, like doodle, which has no type checking or syntax or fancy error messages at all but can parse JPEG, PNG, and GIF images, at least syntactically (it doesn't render the images, just traverses the image structure). Hopefully we can fold these new features back into Fathom as we shake out which ones work and release some demos as we go.
Ultimately I'd like to see this language capable of comfortably describing most of the binary formats we work with day to day, with tools to visualise and explore and validate them as needed. It's been a long project -- the Mercury prototype dates from 2014 -- and this is still an active area of research, but it seems promising!
Do you have a general interest in parsing binary file formats, or are there any specific data formats that you work with? It's always interesting to hear what other people are doing in this space.
There has been a lot to do!
Of course, that seems obvious now.
I assume your language will necessarily take a different approach to Kaitai and other similar efforts. But I wonder if the impedance mismatch is small enough to make it possible to translate from Kaitai's language to Fathom's, which would give you a nice library of formats with presumably very little effort. I've felt like other projects in various problem domains have missed out on some low hanging fruit by not doing something similar, and I wonder if it's because machine translation is perceived as intimidating by a lot of programmers. You don't strike me as such a person, though.
I had forgotten I stumbled on Fathom by way of Allsorts during a late-night fall down a rabbit hole. I think Documentation of OpenType shaping behavior is a brilliant idea. I love projects like that. No, I don't have a particular binary format in mind that motivates my interest. The problem you are solving, though, simultaneously involves several of my intellectual interests: languages and parsing, the economies of specialization, really good abstractions—especially ones that compose really well...
If you are not already familiar with Raph Levien’s blog, you should check it out. I love his two articles on Simplifying Bézier paths and Parallel curves of cubic Béziers. He also has implementations in Rust on GitHub. There are more great things in his mental pipeline that I look forward to.
Can you tell me how Fathom compares with Kaitai? They seem to be solving the same problem in a similar way. What are reasons one might choose Fathom over Kaitai or vice versa?
(Sorry to open an issue, but this repo doesn't have the Discussion section enabled.)