Open pbiggar opened 8 years ago
This seems difficult to preserve. Comments could appear anywhere, even in the middle of commands. For example, should there be a difference between the following two examples?
if foo; then # comment
echo hi
fi
if foo # comment
then
echo hi
fi
Typically, comments are stored as metadata on the AST node to which they are associated (you can't get that perfect, but typically you can get it pretty good). As an example, here it is in phc (a PHP compiler I worked on a lot): http://phpcompiler.org/doc/latest/representingphp.html#comments
Often, parsers will include line and column numbers in the metadata too.
I'm not sure how to do metadata idiomatically in Haskell, so I don't have a recommendation for how to do this is language-bash per se, this is just how it's typically done.
My first thought would be to have a general data type for storing meta-data for some node. Perhaps something like this:
data Metadata a = Metadata { comment::Maybe String, line:: Int, column::Int, value :: a }
Here is an example of a slightly different approach (https://hackage.haskell.org/package/language-javascript-0.6.0.8/docs/Language-JavaScript-Parser.html):
data JSAnnot
= JSAnnot !TokenPosn ![CommentAnnotation] -- ^Annotation: position and comment/whitespace information
| JSNoAnnot -- ^No annotation
deriving (Data, Eq, Show, Typeable)
which is then used like this:
data JSBlock = JSBlock !JSAnnot ![JSStatement] !JSAnnot
Another user contacted me requesting source positions in the AST as well.
Concerning this issue I' d like to explore the design space of GADTs
. We could move to a single datatype representing the AST:
data SimpleCommandT
data CommandT
data AST a t where
SimpleCommand :: ... -> AST a SimpleCommandT
Command :: AST a SimpleCommandT -> ... -> AST a CommandT
Annotated :: a -> AST a t -> AST a t
Comment :: String -> AST a t -> AST a t
-- Backwards compability and convenience type aliases
type SimpleCommand = AST () SimpleCommandT
type Command = AST () CommandT
Ok, GADTs don't work because of the limitiations of the deriving mechanism. I'll dig deeper into this.
I don't know if it is overkill, but I can't help but think of Trees that Grow when it comes to the problem of decorating an AST with metadata.
I had a look at the trees-that-grow solution and it looks quite promissing. (thanks @anka-213)
Proposal: Move all types in Language.Bash.Syntax
into one BashSyn
type.
Since this is a breaking change I'd like to hear @knrafto s opinion about that.
Could you say more about what this BashSyn
type would look like?
@knrafto Here is an example:
-- The different contexts of out AST nodes.
-- Those resemble the type constructors we have in Language.Bash.Syntax.
data Command
data ShellCommand
data Redir
...
-- This decides if a data constructor is applicable in the actual context.
-- If so, we annotate the constructor with ann.
-- If not, the annotation is Void and such a value cannot be constructed.
type family Ann requested actual ann where
Ann a a ann = ann
Ann a b ann = Void
-- The data type for all the syntactic elements of Bash. This type has approximately 50 (!) constructors.
data BashSyn actual ann
= Command (Ann Command actual ann) (BashSyn ShellCommand ann) [BashSyn Redir ann]
| SimpleCommand (Ann ShellCommand actual ann) [BashSyn Assign ann] [Word.Word]
...
Although this approach is not difficult to implement it requires a fair amount of boilerplate. I have a (somewhat) working implementation. The Ord, Bounded and Enum instances are missing but otherwise it seems to work. I don't have much time right now but I hope to publish it soon. Then we can still decide if we want that or not, but at least we have some prototype we can talk about.
Hmm, that does look a bit cumbersome. What's the advantage of having a single datatype to represent all AST structures, rather than a separate datatype for each concept?
It would be useful for me if comments were put in the AST in a "Comment" construct (I plan to translate them into Comments in my new language).