stil4m / elm-syntax

Elm syntax in Elm
MIT License
93 stars 28 forks source link

Encode the AST as Bytes #60

Open jfmengels opened 4 years ago

jfmengels commented 4 years ago

As I mentioned in https://github.com/stil4m/elm-syntax/pull/55#discussion_r440613435, elm-review parses every file in a project, and then caches the resulting AST by storing it on the disk. When it restarts, those files are then read to avoid having to parse the file again.

Over at my work project, we have 160k LoC over 600+ modules. When all of this gets cached, the combined disk space used for all these AST is about 39MB, which is a lot! (The raw source code is about 5.7MB big, FYI).

I think decoding and encoding the same data but using elm/bytes would reduce the amount of space taken. And since reading from disk is (relatively) slow, would speed startup time, also probably the time spent writing this data to disk.

I don't have hard data on how much space and time this would save, but I imagine it will be smaller several folds, as we will be able to store data much more compactly than with JSON.

Since elm-syntax's AST is not opaque at all, we can try this out in a separate package (or directly in elm-review for that matter), and potentially keep it there forever to avoid having elm/bytes as a dependency of elm-syntax if that is something we wish to avoid. One of the problems for elm-review though, is that this data needs to be sent over a port, but ports don't support Bytes. A workaround I heard of is to

Anyway, I wanted to share this need/want of mine. I'll likely tackle this at some point unless someone beats me to it (I have other things to work on for a while :sweat_smile: )

MartinSStewart commented 4 years ago

I think this is a good use case for https://package.elm-lang.org/packages/MartinSStewart/elm-codec-bytes/latest/

MartinSStewart commented 4 years ago

I plan on releasing a package called elm-serialize which is improves upon elm-codec-bytes. I can write encoders/decoders with it and release it under elm-syntax-serialize (I'm already writing a elm-geometry-serialize and this is the naming scheme I've settled on).