s-expressionists / Eclector

A portable Common Lisp reader that is highly customizable, can recover from errors and can return concrete syntax trees
https://s-expressionists.github.io/Eclector/
BSD 2-Clause "Simplified" License
109 stars 9 forks source link

Ensure every CST element has a SOURCE, suggest how to capture whitespace and comments #28

Open eschulte opened 6 years ago

eschulte commented 6 years ago

I'd like to be able to parse a source file into a tree structure which includes all information required to build the source (including whitespace and comments). This would be easier if every parsed CST element has a valid associated source range. Currently that isn't the case for (at least) QUOTE elements. See:

SEL/LISP-DIFF> (labels ((maptree (fn tree)
                          (if (concrete-syntax-tree:consp tree)
                              (cons
                               (maptree fn (concrete-syntax-tree:first tree))
                               (maptree fn (concrete-syntax-tree:rest tree)))
                              (funcall fn tree))))
                 (maptree #'concrete-syntax-tree:source
                          (with-input-from-string (in "'(1 2 3 4)")
                            (cst-read in))))
(NIL ((2 . 3) (4 . 5) (6 . 7) (8 . 9)))

I've taken a shot at building this using cst-read and record-skipped-input with some custom code to fold everything together and add whitespace. See https://github.com/GrammaTech/sel/blob/master/ast-diff/lisp-diff.lisp#L71, is there a better way?

Thanks!

scymtym commented 6 years ago

There are two separate questions here:

  1. Can't we ensure that every CST node has an associate source range?

    For eclector.concrete-syntax-tree:cst-read we take the position that not all CST nodes have a corresponding source range.

    • One reason are reader macros which can fabricate arbitrary expressions not related to the source at all. The symbol quote in (quote 1 2 3 4) is a simple example of this.

    • Another reason are parts of the list structure that have no representation in the input string:

      (labels ((maptree (fn tree)
              (if (concrete-syntax-tree:consp tree)
                  (list
                   (funcall fn tree) ; CHANGED
                   (maptree fn (concrete-syntax-tree:first tree))
                   (maptree fn (concrete-syntax-tree:rest tree)))
                  (funcall fn tree))))
      (maptree #'concrete-syntax-tree:source
              (with-input-from-string (in "'(1 2 3 4)")
                (eclector.concrete-syntax-tree:cst-read in))))
      ((0 . 10) NIL (NIL ((1 . 10) (2 . 3) (NIL (4 . 5) (NIL (6 . 7) (NIL (8 . 9) NIL)))) NIL))

    A custom client can of course implement a different behavior. To this end, consider the wip-parse-result-protocol-2 branch and in particular https://github.com/robert-strandh/Eclector/blob/wip-parse-result-protocol-2/code/parse-result/second-climacs-test.lisp .

  2. Is there a better way to integrate skipped input into the parse result?

    Again, consider the wip-parse-result-protocol-2 branch and in particular https://github.com/robert-strandh/Eclector/blob/wip-parse-result-protocol-2/code/parse-result/second-climacs-test.lisp .

    There is a good chance we will do something very similar in the master branch to better support clients other than cst-read.

eschulte commented 6 years ago

Thank you for the pointer to that branch. I hope this functionality makes its way into master. Using the branch and adopting the example you cited, I was able to get exactly what I wanted (a tree structure with sufficient information to fully reproduce the full text of the source file) with a much simpler implementation. See https://github.com/GrammaTech/sel/blob/wip-lisp-diff-uses-eclector-parse-result-protocol-2/ast-diff/lisp-diff.lisp

eschulte commented 5 years ago

To followup, over a year later we're still using this branch. Any chance this functionality will (or maybe has already) make its way to master?

scymtym commented 5 years ago

Any chance this functionality will (or maybe has already) make its way to master?

This has not happened yet, but several steps towards that goal are underway.

scymtym commented 5 years ago

To clarify, the parse-result protocol has made it into master, but I'm not sure it fully supports this use-case.

scymtym commented 5 years ago

I made https://github.com/GrammaTech/sel/pull/11. Let me know if that works for you.