haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.6k stars 690 forks source link

Meta: Exact-printer Mega-issue #7544

Open emilypi opened 3 years ago

emilypi commented 3 years ago

What is this issue?

A central place to discuss the issue of a Cabal exact-printer so that we can push forward the work all in one place. Currently, The discussion stretches back across many issues over the past 6 years.

What is it?

An exact-printer, as inspired by ghc-exactprint, is a byte-for-byte bidirectional parser and pretty printer, which gaurantees the following constructs exist and are principled:

Why do we need it?

This work is tied very closely with the format, init, gen-bounds and other work as shown in historical discussions:

This would free us up for several important quality of life improvements to the cabal install ecosystem, as well as general consumers of the Cabal library API. We would like to have this in by Cabal-3.8.0.0. The important bits this enables include:

Who's in charge of this?

I am currently overseeing @ptkato who has been tasked with taking this on. Please consider him and myself a point of contact for this work, and join us in libera.chat#hackage to discuss.

@ptkato @gbaz @davean

gbaz commented 3 years ago

We have an intermediate "lexed tree" structure in https://github.com/haskell/cabal/blob/1be581d5463adebf8dc8cc8a5375a9ca3051970a/Cabal/src/Distribution/Fields/Field.hs

One idea I would throw out is turning that structure into the main structure we get exactprinting working with (making it comment and whitespace preserving). Then transforms on cabal files involve parsing to that, then running the FieldGrammar parser to get the nice fully typed structure, modifying that and emitting it back to Fields, and then calculating the delta to apply the changes thus made to the original exact Fields.

emilypi commented 3 years ago

There's good prior art here: - https://github.com/phadej/cabal-fmt

tonymorris commented 3 years ago

I've written a few of these for other formats. If you need help @ptkato then @emilypi can hook us up.

Mikolaj commented 3 years ago

See also the old github project about that: https://github.com/haskell/cabal/projects/9

I've updated it with related issues I've found, but then also marked all the issues with a label, so the indexing is available in both forms now. If the project is not going to be used, please remove it. The labeling will remain, preserving the data.

ptkato commented 2 years ago

One idea I would throw out is turning that structure into the main structure we get exactprinting working with (making it comment and whitespace preserving).

I combed through cabal-fmt and here, and I must say, I'm not a fan of how the comments would need to be dealt with if we use that approach. Yes, for solving it works great, it cuts down a considerable amount of time, but for an exact-printer it seems... odd (and potentially making it more complicated than it should be). I really do believe that that the right way would be to parse it all at once, as opposed to appending the comments afterwards.

phadej commented 2 years ago

Consider that the Field is not the most refined representation of cabal files. It's like parsing JSON file into Value with aeson. There's still the step of taking that structure and (also) parsing it further into GenericPackageDescription.

The Cabal library completely lacks the AST representation of .cabal files: we don't really need that intermediate step, thought it would help report warnings more precisely in Check. That, however, is not the most important piece of functionality.


GHC / ghc-exactprint doesn't put comments into AST either. There is very good reason: literally everything would need to be intercalated with comments (as {- ... -} comments can be literally anywhere in between).

cabal format is special, that line comments are always on own line (!!!) so we could intercalate them in Field representation. Fine.

BUT, if we want to represent fiield contents as data structure too then we face the same problem as GHC:

build-depends:
   base
      -- minimum is GHC-7.0
      >= 4.3
      -- and maximum ..
      && <= 4.17

I expect that cabal-exactprint would allow me to change the 4.17 into 4.18 preserving all the formatting, and with representation close to what is written (i.e. AST node for &&, >= etc.).

I don't see how comments can be attached to that. (I suspect that ghc-exactprint has some logic so it can fit comments between tokens it prints).

jneira commented 2 years ago

Sure @alanz could help us to compare ghc-exactprint with the requirments for cabal

ptkato commented 2 years ago

BUT, if we want to represent fiield contents as data structure too then we face the same problem as GHC:

...

I expect that cabal-exactprint would allow me to change the 4.17 into 4.18 preserving all the formatting, and with representation close to what is written (i.e. AST node for &&, >= etc.).

Indeed, a tad bit more complex AST would be needed to take care of that... shape, but I believe that's about it. And yes, input from @alanz would certainly be appreciated.

alanz commented 2 years ago

Sorry all, I have been ignoring this issue for a while, my attention being elsewhere. I will take a decent look in the next few days.

davean commented 2 years ago

I don't see how the multiple comments thing is an issue. Your annotation functor would of course contain a list of (maybe) comments, in most cases it would be empty, in others it would be a singleton, and in the most complicated case it would be comments, and skips. Of course you can do the same sort of thing in GHC's case, but it becomes complicated enough there to remove the point I think.

Even with such information it'll be complicated to get correct - does a comment associate to the line above or below isn't apparent. For example if you delete data, does a comment go with that data or not?

My annotation would be something like [Maybe (WhiteSpace, Comment)] (illustratory only of course).

alanz commented 2 years ago

First quick comment

GHC / ghc-exactprint doesn't put comments into AST either.

Prior to GHC 9.2 this is true, from GHC 9.2 they are in the AST, as part of the exact print annotations. But that is just mechanics of data capture.

In ghc-exactprint we work with an item having some form of "anchor", which represents the top-left point from which it gets rendered, much like in a normal pretty printer. This means that whatever leading indentation exists at that point gets preserved as we print items contained in the one being printed.

We capture comments as being contained within the span of the AST item being printed, or coming after it for top level items.

This is good enough for exact-printing an unchanged AST. If the AST needs to be modified prior to printing, we do a comment re-balancing process, where we try to keep comments associated with the AST item they logically belong to, which means a preceding document comment will be kept with the item following it, and comments between things associate to the thing without a gap, so a trailing comment on the same line, or without a line break will associate with the prior one. This is all very heuristic driven, and up for optimization.

To be able to handle that case, as the exact-printer finds comments it throws them into a queue, and then prints any occuring between the current print head position and the next output item. (Logically, the actual mechanics are a bit more complex because everything is deltas).

And some day I need to capture all of this in a proper document. Perhaps someone should offer to co-author something with me, to hold my feet to the fire of actually documenting it.

phadej commented 2 years ago

Prior to GHC 9.2 this is true, from GHC 9.2 they are in the AST, as part of the exact print annotations. But that is just mechanics of data capture.

That's the idea why Field type has ann parameter, and that's where cabal-fmt puts attached comments. I.e. there isn't separate Comment AST node, as there isn't in GHC.

ptkato commented 2 years ago

Oki, I think we should keep things as simple as possible, as long as it covers our needs, taking into account the inputs from this issue, I think one acceptable shape for our AST would be something along the lines of:

data ExactPrint ann
  = Field   ann String [ExactPrint ann]         -- your field (or section) with its children
  | Value   ann String (Maybe (ExactPrint ann)) -- the fields' values
  | Cont    ann String (ExactPrint ann)         -- this would mold a single value across multiple lines
  | Comment ann String                          -- comment
  | Empty   ann                                 -- empty lines, and also to stop the recursion in Cont

It would allow enough flexibility to depict a cabal-like format in its entirety while being quite expressive and not overly complicated. Thoughts?

phadej commented 2 years ago

I really really would like to have a build-depends as an AST so I can modify it, without thinking about how it's represented as Strings, and glued together to make the contents of that field.

As I said, nodes for &&, <package-name>, constraints (>= 1.2.3) etc.

Because I think that current Field representation already contains enough information to exact-print it. (The position of : after field is not recorded, but that's a minor problem). We need to do some work to attach comments, but cabal-fmt demonstrates that it's possible. I don't think it's very useful though. I wouldn't call that "cabal exactprint".

ptkato commented 2 years ago

Do we have any other case where a specific treatment would be desired or is it something exclusive to build-depends?

gbaz commented 2 years ago

I think we've let perfect be the enemy of the good far too long. A nice special representation for build-depends (or any other multiline field) can be built on top of any of the proposed structures, and that should be fine. Even without that, we have 90% of the use cases that we really really want this feature for unblocked. ptkato: go for it!

ptkato commented 2 years ago

This looks a little overboard, but I think it shall suffice:

data ExactRoot ann = ExactRoot [ExactSection ann] [ExactField ann]

--                         value            ":"|","      children
data ExactSection    ann
  = ExactSection           !ann !ByteString              [ExactField ann]
  | ExactSectionComment    !ann !ByteString

data ExactField      ann
  = ExactField             !ann !ByteString !ann         [ExactValue ann]
  | ExactFieldIf           !ann !ByteString !ann         [ExactField ann] !(Maybe ann) (Maybe [ExactField ann])
  | ExactFieldComment      !ann !ByteString              

data ExactValue      ann
  = ExactValue             !ann !ByteString
  | ExactValueDependencies                               [ExactDependency ann]
  | ExactValueComment      !ann !ByteString

data ExactDependency ann 
  = ExactDependency        !ann !ByteString !(Maybe ann) (ExactConstraint ann)
  | ExactDependencyComment !ann !ByteString             

data ExactConstraint ann
  = ExactConstraint        !ann !ByteString !ann !ByteString
  | ExactConstraintAnd     (ExactConstraint ann) !ann (ExactConstraint ann)
  | ExactConstraintOr      (ExactConstraint ann) !ann (ExactConstraint ann)
  | ExactConstraintParens  !ann (ExactConstraint ann) !ann
  | ExactConstraintComment !ann !ByteString

That second !ann in ExactField will take care of the : position problem that phadej mentioned.

gbaz commented 2 years ago

If you think that looks good, then I guess we should proceed. Thoughts @davean ?

ptkato commented 2 years ago

I actually changed it quite a bit, let me update it.

Edit: Here we go, updated.

gbaz commented 2 years ago

@ptkato Is this work stalled out at the moment, or do you have a gameplan to proceed?

ptkato commented 2 years ago

Both? Instead of writing yet another parser from scratch, I was suggested to modify one of the existing parsers, so I thought about yoinking Field for the AST defined above, and adapting everything from there, but got a bit stuck.

gbaz commented 2 years ago

@davean said he might be able to slice off some time to pair with you and work to unstick this project. Hopefully you two can schedule something.

ptkato commented 2 years ago

That would be most appreciated.

santiweight commented 2 years ago

Does anyone want to work on this for a bit this weekend @ptkato? I'm not sure how much I can help, but perhaps I can!

ptkato commented 2 years ago

Hello, this issue has been here for quite a while, and I would like to have some thingies clarified, so I can progress with this work, after all the next release is right around the corner. Like I mentioned to @Kleidukos in a brief conversation we had, my goal is clear as day, however what is not clear is the path I should be taking to achieve such goal.

Throughout this whole thingy, I gathered a bit of info on how I should proceed, and that actually made the lines blurrier. First came the performance concerns, if anything, I wouldn't want to mess with something that could have an impact on the overall performance of Cabal, it would be quite disastrous; secondly, I was suggested to avoid making a whole parser from scratch, allegedly because Cabal already has too many of them, which is a reasonable request, but messing up with something that is already established seems like asking for trouble, specially since it is working mostly fine, apart from the exact-printing stuff.

Now, I had a meeting with Davean, and he mentioned that the exact-printer would only be used on demand, on a case-to-case basis, and he recommended (I think?) that I split it all from the other stuff, instead of trying to integrate with existing parsers; given the circumstances, that is actually great and would make thingies easier, since I wouldn't need to worry about my previous concerns all that much. However, I was left with the impression that I would still need to cling to some things, like alex. That made me wonder once again, wouldn't that affect the current parsers all the same? As in, I am not sure if the structure I posted a few comments above (https://github.com/haskell/cabal/issues/7544#issuecomment-934009792) can fit the lexer we already have, and if changes in the lexer would be needed (probably?).

This all put me on the fence about whether I should blaze my own path, or to keep myself to the beaten path. So I hope that the more knowledgeable and familiarised to Cabal's inner workings people could reach a consensus and help giving me a unified direction, a path to take that can lead me to finishing up this stuff proper. I would really like to get those thingies clarified and sorted proper, so this work can progress further.

Mikolaj commented 2 years ago

@ptkato: if ou don't get enough answers here, please join the cabal devs call and ask and/or link this on Matrix incessantly until noticed. The call may be an easier way of the two.

gbaz commented 2 years ago

I wish I could help. It seems like you've been given conflicting advice. I think the idea would be that you don't need to have the exact-printer code be used everywhere. However, you should attempt to not duplicate parser code and therefore create simply extend it so it can optionally pick up exact printer details.

All I can say is we definitely don't want another whole parser from scratch.

My advice would be to see if the structure does fit the existing lexer, and if not, detail the obstacles clearly so people can take a look and provide feedback.

gbaz commented 1 year ago

data ExactRoot ann = ExactRoot [ExactSection ann] [ExactField ann]

Artem notes that top-level sections and fields can interleave, so it should probably be [Either (ExactSection ann) (ExactField ann)] or the like.

Edit: this actually needs testing. It may be that bare fields after sections simply get dropped. In which case we don't need this structure but we should emit a warning on such files (and perhaps, eventually, at a high enough cabal file version, an error)

BurningWitness commented 11 months ago

I know I'm years late to this discussion, but... why not have a separate parser for this?

From my understanding any Cabal file is simply a recursive tree where each level is a list of newlines, comment lines, sections and fields. Each field is a leaf and has its own custom format. As such to edit base bounds on a library named foo all you really need to do is a basic traversal:

first section "library foo" -> first field "build-depends" -> (first dependency "base" -> set bounds)

Ideally this traversal is stateful, you don't want to touch any fields you don't need to. As such the format of this tree has an Unparsed | Parsed branch on every point of recursion and each field value is always stored as plain text.

It's dirt simple and fulfills all the issue expectations:


For the record when I say "separate parser" I don't mean code cannot be reused, I just don't know if it can be reused in the remarkably complicated set of steps Cabal-syntax uses.

liamzee commented 11 months ago

Unless you want to do it first, I'm planning on setting up a flatparse-cabal-exact library that, ummm, uses flatparse to provide an exact-parser for Cabal.

Half the benefit is that flatparse is at least 50% faster than Attoparsec, the other benefit is that fce would be an external library; consequently it can be simply a "Just do it" then ask the cabal team for feedback on modifications later in the event that they want to use it.

BurningWitness commented 11 months ago

Cabal-syntax currently uses parsec, so unless the team wants to refactor the entire package I don't see a good enough reason to move off of it. Also please don't assume Cabal files are all small, some people like to have fun with things (see vulkan-raw.cabal).

As on prototyping, I'm not keen on writing code unannounced. The issue asks for a spiritual clone of ghc-exactprint, while the solution I'm proposing is just dissecting the file.

emilypi commented 11 months ago

Caveat: i have a cold so my brain is not all there and apologies if i missed any points in the discussion:

I wouldn't have any problem with cabal-exactprint being a separate library in the same sense as Cabal-syntax. Inserting an upstream dependency of cabal-install is easy; the hard thing we want to avoid is baking it directly into Cabal or Cabal-syntax, which poses a burden on the GHC team for minimal benefit. I would prefer if we stuck with parsec (or whatever parser is shipped with GHC in the future) because it makes the dependency effectively cost-free for tooling authors. That said, flatparse small itself, and cabal-install is a binary, so it wouldn't affect us that much. But still - performance here isn't going to be a huge issue i wouldn't think. There are large cabal files, but we're dumping to hstdout at the end of the day and that's going to dominate everything.

liamzee commented 11 months ago

In my case, I have an independent use case (trying to build a GUI tool that calls cabal as an external process), where part of the goal is to have an editor for cabal files (i.e, GUI make the package format easier to learn) and cabal.project files.

If it's reusable or can otherwise be adapted, it'd be great, if not, serves my use case.

BurningWitness commented 11 months ago

Now that I think about it it indeed should be a separate library because it's the interface for writing Cabal files, hence all the types inside it merely define the format and prove nothing else.

However I'm not proposing anything similar to ghc-exactprinter, so how about calling it cabal-layout? Don't want to use cabal-format because that already has its own meaning.

BurningWitness commented 11 months ago

I've played around with Distribution.Fields.Parser a bit and there are a few inconsistencies I've spotted:

tonyday567 commented 11 months ago

However I'm not proposing anything similar to ghc-exactprinter

I'm not so sure about that. What you're suggesting; deferral of parsing into concrete types and, instead, leaving the tree as Comments or Newlines or NotComments might be the breakthrough needed.

Firstly, somewhere in the bowels of the library, something somewhere parses the file text, and immediately throws out the comments (and maybe the whitespace?). It's possible that the early parsing you suggest can be inserted before this, in general, and then parse the NotComment text along to the next stage. This might mean only a very light touch is needed to fix gen-bounds, init and format.

Secondly, busting the problem up into two - Comment/NotComment + late-binded parsing - might be exactly what is needed to make exact-printing tractable within the cabal syntax context.

BurningWitness commented 11 months ago

Dumped a few more hours into this, here are the findings:

liamzee commented 11 months ago

@BurningWitness

Are these discoveries based on testing, or an understanding of the parser semantics?

Likewise, I'm stalled; I'm more focused on understanding Cabal right now.

Are you making an attempt on the project, as HLS needs an exact parser (contact Fendor if interested), and there's interest in retrofitting cabal add functionality into cabal-install?

ffaf1 commented 11 months ago

Hello Oleksii,

I don't want to encode global positions into the tree, as tree modifications can break those references.

Is this valid even for relative (to an anchor) positions?

BurningWitness commented 11 months ago

@ffaf1 No, the tree requires relative positions to be encodable. I'm bringing the global position issue up because that's something that both Cabal-syntax's lexer and GHC's parser support, so there might be expectations concerning this.

> Distribution.Fields.Parser.readFields "foo\n  bar: baz"
Right [Section (Name (Position 1 1) "foo") [] [Field (Name (Position 2 3) "bar") [FieldLine (Position 2 8) "baz"]]]

@liamzee The discoveries are gained from reverse engineering, I have no interest in rewriting Cabal-syntax's lexer. Also note that when I make points I'm not really hitting a dead end, there's simply more than one way to go about solving each one and I'm choosing the path that looks sound to me.

I am indeed making an attempt, both because this looks like a week of novel work and because I have a good mental model for it, however whether anyone actually agrees with said model is still unknown to me.

liamzee commented 11 months ago

@BurningWitness At the very least, the community seems to be missing a good exact-parser for .cabal files, and this seems to be an urgent need given the desire for cabal add and better HLS cabal file capabilities.

The best-known attempt I'm familiar with is the following:

https://github.com/VeryMilkyJoe/haskell-language-server

I am eagerly pouring over your notes, so even if you lose interest, your work remains useful. Thanks for the updates!

LemonjamesD commented 10 months ago

Is this supposed to be a separate parser or a replacement for the existing parser in cabal now?

fendor commented 10 months ago

Separate parser with a different goal compared to Cabal's internal one. However, we are currently in the process of evaluating how much work it would be to fix Cabal's parser to give HLS what it needs.

BurningWitness commented 10 months ago

It's a separate parser because Cabal-syntax is already huge and it already does its things in a way that's fundamentally incompatible with this approach.

As the resulting parser simply parses anything that has a format of .cabal, any library downstream should be free to make a parser from the resulting AST to whatever data it needs. If the parser delivered is good enough, then later on perhaps there will be talks of splitting Cabal-syntax further, however right now it's too early to discuss that.

gbaz commented 10 months ago

Note that there is I think a perfectly cromulent and not too difficult proposal above that would retrofit cabal's existing parser with sufficient exact-parsing capabilities. See this message and the discussion prior:

https://github.com/haskell/cabal/issues/7544#issuecomment-934009792

I am aware people are interested in pursuing different independent projects and we welcome that. However, this is a good proposal and I have not seen any objection to it other than that it requires some work to get up to speed enough with the cabal parser, which is complex, to be capable of doing this. So I would encourage people to seriously consider it.

BurningWitness commented 10 months ago

I don't know how well that representation matches reality, but the annotationless one I have isn't all that long either.

My variant ```haskell -- | Context-dependent whitespace. newtype Offset = Offset Int deriving newtype Show -- | Context-independent whitespace. newtype Whitespace = Whitespace Int deriving newtype Show -- | Anything that follows two consecutive hyphens. Lasts until the end of the line. data Comment = Comment Whitespace -- ^ Before double hyphens Whitespace -- ^ Between double hyphens and text Text deriving Show -- | Any Unicode characters, excluding C0 control codes, '{', '}', ':'. newtype Heading = Heading Text deriving newtype Show -- | Any Unicode characters, excluding C0 control codes, '{', '}', ':' and spaces. newtype Name = Name Text deriving newtype Show -- | Field contents at the same line as the declaration. data Inline = Inline Text -- ^ Includes preceding whitespace | NewlineI deriving Show -- | Field contents at the lines following the declaration. data Line = Line Text -- ^ Includes preceding whitespace | CommentL Comment | NewlineL deriving Show -- | Curly bracket syntax, together with the preceding empty space data Curlies a = CommentC Comment (Curlies a) | NewlineC (Curlies a) | Curlies Whitespace -- ^ Before left bracket a Whitespace -- ^ Before right bracket deriving Show -- | List with an alternative unparsed representation. data Gradual u a = Entry a (Gradual u a) | More u | End deriving Show -- | Section contents with the curly bracket alternative. data Section = CurlS (Curlies (Gradual [Line] Node)) | NormalS (Maybe Comment) -- ^ Inline comment (Gradual [Line] Node) deriving Show -- | Field contents. data Contents = Contents Inline [Line] deriving Show -- | Field contents with the curly backet alternative. data Field = CurlF (Curlies Contents) | NormalF Contents deriving Show data Node = Section Offset Heading Section | Field Offset Name Whitespace -- ^ Between field name and colon Field | CommentT Comment | NewlineT deriving Show newtype Layout = Layout (Gradual Lazy.Text Node) deriving newtype Show ```

The big problem in my case isn't the definition, it's parsing:

All three are solved with lookahead, so the parser gets quite convoluted.

Bodigrim commented 10 months ago

(random thoughts and snippets)

One can parse and annotate each field with its source like this:

readFieldsAnnotatedWithSource :: ByteString -> Maybe [Field ByteString]
readFieldsAnnotatedWithSource bs =
  either (const Nothing) (Just . snd . L.mapAccumR annotateField maxBoundPos) (readFields bs)
  where
    annotateField :: Position -> Field Position -> (Position, Field ByteString)
    annotateField finishPos = \case
      Field (Name pos name) fls -> (pos, Field (Name (getSrcBetween pos finishPos') name) fls')
        where
          (finishPos', fls') = L.mapAccumR annotateFieldLine finishPos fls
      Section (Name pos name) args fs -> (pos, Section (Name (getSrcBetween pos finishPos'') name) args' fs')
        where
          (finishPos', fs') = L.mapAccumR annotateField finishPos fs
          (finishPos'', args') = L.mapAccumR annotateSectionArg finishPos' args

    annotateFieldLine :: Position -> FieldLine Position -> (Position, FieldLine ByteString)
    annotateFieldLine finishPos (FieldLine pos xs) = (pos, FieldLine (getSrcBetween pos finishPos) xs)

    annotateSectionArg :: Position -> SectionArg Position -> (Position, SectionArg ByteString)
    annotateSectionArg finishPos = \case
      SecArgName pos xs -> (pos, SecArgName (getSrcBetween pos finishPos) xs)
      SecArgStr pos xs -> (pos, SecArgStr (getSrcBetween pos finishPos) xs)
      SecArgOther pos xs -> (pos, SecArgOther (getSrcBetween pos finishPos) xs)

    getSrcBetween :: Position -> Position -> ByteString
    getSrcBetween from to = snd $ splitAtPosition from $ fst $ splitAtPosition to bs

    maxBoundPos :: Position
    maxBoundPos = Position maxBound maxBound

Then manipulate [Field ByteString], then restore the source file with foldMap fold :: [Field ByteString] -> ByteString. This to a certain limit just works:

gbaz commented 10 months ago

I don't know how well that representation matches reality, but the annotationless one I have isn't all that long either.

The point of the proposed representation I linked is not brevity or not but rather that it "forgets down" to the existing "Field" represntation, suitable for use downstream in the existing parser pipeline. This means that it can be added as an extension to the existing parser, while a whole new AST would likely not be so simple. The other point of that proposed representation is that it parses efficiently an extension to the existing efficient parser.

BurningWitness commented 10 months ago

Then you (the maintainers) will have to agree on what the scope of this task is and how you see Cabal's parsing working in the future.

In my view the current bidirectional all-versions-in-one double-specialization parser is extremely unwieldy (e.g. buildInfoFieldGrammar). I suspect this is, at least in part, the reason for why format changes over the versions have been quite anemic, only adding and deprecating specific fields, never cleaning up.

If everyone is satisfied with the status quo and a mere extension of the lexer is deemed the easiest approach, then I'm not the person to solve this as I have no interest in tag soup parsers.

gbaz commented 10 months ago

I think an independent library that can parse, modify, and exactprint cabal files would be quite nice. The concern would be that it might not keep pace with the spec, while doing so in an integrated way would. If your intent remains, as stated above, to write an independent library that does not replace what currently exists, then please go for it! All sorts of independent tools can be written on top of it and integrated with cabal using the external command system, and this could well be a good approach.

That said, I do not see a path to replacing the entire existing cabal parser, with its needs for efficiency, back-compat, etc. At this point what we can do is only evolve it.