haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.61k stars 691 forks source link

RFC: new intermediate cabal file representation. #3614

Open phadej opened 8 years ago

phadej commented 8 years ago

Problem: Ultimately we want formatting-preserving programmagic refactorings of .cabal files.

Current situation:

Solution:

This RFC doesn't propose how CabalAst would look like specifically, as some exprerimentation on implementation is needed. ghc-exactprint.

ping @alanz @hvr @dcoutts

23Skidoo commented 8 years ago

Would be nice to have something like that for config files as well.

phadej commented 8 years ago

@23Skidoo, isn't config files flat: i.e. no sections? We could experiment on them first, as they are simpler.

phadej commented 8 years ago

From IRC discussion: starting with ~/.cabal/config would be also easier as the field types are much simpler. I have no good idea how to represent build-depends in editable yet formatting-preservable way.

Also: @dcoutts' experiment: http://code.haskell.org/~duncan/cabal-ast-experiment/

phadej commented 8 years ago

@dcoutts also proposed to change GenericPackageDescription into CabalAst i.e. into something which supports refactoring.

23Skidoo commented 8 years ago

@phadej No, they have sections: repository, install-dirs, program-locations, etc.

phadej commented 8 years ago

@23Skidoo: yes, but sections are predefined and used to group fields. i.e. the file could be flat.

dcoutts commented 8 years ago

@phadej so when I thought about it some time ago I concluded that there were only a few kinds of fields. Lets see if I can remember what they are:

The point is, it may be possible to pre-split the list fields and then only keep pos info at that level, not within the AST of the field entries.

phadej commented 8 years ago

@dcoutts makes sense. Compound expressions are still tricky.

phadej commented 8 years ago

Also to remember: https://github.com/alanz/cabal-ast-play

alanz commented 8 years ago

Here is a brain dump from me

Based on my GHC / ghc-exactprint experiences I would suggest using an initial (from the parser) AST that has a clear mapping to the underlying cabal file, and generally keeps things in order. So resist putting all like things together, especially if they can be interleaved in the file.

Then have an ann parameter, which can have location info initially, and delta info later to be pretty printed.

If the initial AST gets processed for use, it makes sense to track in some clear way how each part is derived from the initial one, even if that is done via an annotation at that level too.

So the annotations are never part of the main line of processing, only ever used for round-tripping (with possible modification the cabal file).

phadej commented 5 years ago

I updated the description. Also to complement @dcoutts comment, current parsec approach parses ByteString to whatever field type is. There is no intermediate step, parsing e.g. to a list of tokens first. Maybe there should be (I even think it could help with the performance! or at least not ruin it).