julianthome / inmemantlr

ANTLR as a libray for JVM based languages
MIT License
108 stars 24 forks source link

Question regarding parse tree generation #35

Closed julianthome closed 5 years ago

sjiya commented 5 years ago

I want to generate Abstract Syntax Tree (AST) of a mathematical equation from my MathGrammar.g4 antlr file. I want a AST without grammar tags in the tree only I want the equation components in the tree. suppose, I have an equation x^2+y^2=z^2, so I want tree in this style

= ---(+)---(^) -------------(x) -------------(2) ---------(^) -------------(y) -------------(2) ---(^) ------(z) ------(2) (=(+(^(x)(2))(^(y)(2)))(^(z)(2))) How can I do it in inmemantlr system?

julianthome commented 5 years ago

Hi @sjiya, I have renamed this issue ;). Please always feel free to create new issues that match your problem description. The inmemantlr tree is pretty generic and contains all the information you are looking for regarding the grammar rules and the matched tokens. However, if you want to obtain the reduced parse tree you are asking for, I would execute the following steps:

  1. Generate the inmemantlr parse tree as described in the documentation
  2. Implement a parse tree processor that translates the inmemantlr parse tree to your reduced representation. You can find an example here. I provided a more detailed explanation about parse tree processors in the documentation. You might also want to checkout the parse tree processor test-cases.
sjiya commented 5 years ago

Thanks for prompt reply. plz check attached code and generated tree. I am unable to print AST from this code. I would appreciate if you suggest changes in my code. TreeTest.zip

julianthome commented 5 years ago

Thanks @sjiya, would it be also possible for you to share the grammar you have used in the example?

sjiya commented 5 years ago

I am attaching grammar file here. LG.zip

sjiya commented 5 years ago

I am attaching Grammar file here.

Regards, Sharaf Hussain

On Fri, Feb 8, 2019 at 12:00 PM Julian Thomé notifications@github.com wrote:

Thanks @sjiya https://github.com/sjiya, would it be also possible for you to share the grammar you have used in the example?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/julianthome/inmemantlr/issues/35#issuecomment-461710042, or mute the thread https://github.com/notifications/unsubscribe-auth/ATUL2oduP7trGhXcEMwEwkOKSmLC55VYks5vLSCBgaJpZM4XTxF2 .

julianthome commented 5 years ago

Hi @sjiya, your code seems works on my machine ... this is the output:

Tree Node Size: 64
graph {
    node [fontname=Helvetica,fontsize=11];
    edge [fontname=Helvetica,fontsize=10];
    n1 [label="root",shape=ellipse];
    n2 [label="math",shape=ellipse];
    n3 [label="relation",shape=ellipse];
    n4 [label="relation",shape=ellipse];
    n5 [label="expr",shape=ellipse];
    n6 [label="additive",shape=ellipse];
    n7 [label="mp",shape=ellipse];
    n8 [label="unary",shape=ellipse];
    n9 [label="postfix",shape=ellipse];
    n10 [label="exp",shape=ellipse];
    n11 [label="comp",shape=ellipse];
    n12 [label="atom",shape=ellipse];
    n11 -- n12;
    n10 -- n11;
    n9 -- n10;
    n8 -- n9;
    n7 -- n8;
    n6 -- n7;
    n5 -- n6;
    n4 -- n5;
    n3 -- n4;
    n13 [label="relation",shape=ellipse];
    n14 [label="expr",shape=ellipse];
    n15 [label="additive",shape=ellipse];
    n16 [label="mp",shape=ellipse];
    n17 [label="unary",shape=ellipse];
    n18 [label="postfix",shape=ellipse];
    n19 [label="exp",shape=ellipse];
    n20 [label="comp",shape=ellipse];
    n21 [label="frac",shape=ellipse];
    n22 [label="expr",shape=ellipse];
    n23 [label="additive",shape=ellipse];
    n24 [label="mp",shape=ellipse];
    n25 [label="unary",shape=ellipse];
    n26 [label="unary",shape=ellipse];
    n27 [label="postfix",shape=ellipse];
    n28 [label="exp",shape=ellipse];
    n29 [label="comp",shape=ellipse];
    n30 [label="atom",shape=ellipse];
    n29 -- n30;
    n28 -- n29;
    n27 -- n28;
    n26 -- n27;
    n31 [label="postfix",shape=ellipse];
    n32 [label="exp",shape=ellipse];
    n33 [label="comp",shape=ellipse];
    n34 [label="atom",shape=ellipse];
    n33 -- n34;
    n32 -- n33;
    n31 -- n32;
    n26 -- n31;
    n35 [label="postfix",shape=ellipse];
    n36 [label="exp",shape=ellipse];
    n37 [label="comp",shape=ellipse];
    n38 [label="func",shape=ellipse];
    n39 [label="expr",shape=ellipse];
    n40 [label="additive",shape=ellipse];
    n41 [label="additive",shape=ellipse];
    n42 [label="mp",shape=ellipse];
    n43 [label="unary",shape=ellipse];
    n44 [label="postfix",shape=ellipse];
    n45 [label="exp",shape=ellipse];
    n46 [label="exp",shape=ellipse];
    n47 [label="comp",shape=ellipse];
    n48 [label="atom",shape=ellipse];
    n47 -- n48;
    n46 -- n47;
    n45 -- n46;
    n49 [label="atom",shape=ellipse];
    n45 -- n49;
    n44 -- n45;
    n43 -- n44;
    n42 -- n43;
    n41 -- n42;
    n40 -- n41;
    n50 [label="additive",shape=ellipse];
    n51 [label="mp",shape=ellipse];
    n52 [label="unary",shape=ellipse];
    n53 [label="postfix",shape=ellipse];
    n54 [label="exp",shape=ellipse];
    n55 [label="comp",shape=ellipse];
    n56 [label="atom",shape=ellipse];
    n55 -- n56;
    n54 -- n55;
    n53 -- n54;
    n52 -- n53;
    n51 -- n52;
    n50 -- n51;
    n40 -- n50;
    n39 -- n40;
    n38 -- n39;
    n37 -- n38;
    n36 -- n37;
    n35 -- n36;
    n26 -- n35;
    n25 -- n26;
    n24 -- n25;
    n23 -- n24;
    n22 -- n23;
    n21 -- n22;
    n57 [label="expr",shape=ellipse];
    n58 [label="additive",shape=ellipse];
    n59 [label="mp",shape=ellipse];
    n60 [label="unary",shape=ellipse];
    n61 [label="postfix",shape=ellipse];
    n62 [label="exp",shape=ellipse];
    n63 [label="comp",shape=ellipse];
    n64 [label="atom",shape=ellipse];
    n63 -- n64;
    n62 -- n63;
    n61 -- n62;
    n60 -- n61;
    n59 -- n60;
    n58 -- n59;
    n57 -- n58;
    n21 -- n57;
    n20 -- n21;
    n19 -- n20;
    n18 -- n19;
    n17 -- n18;
    n16 -- n17;
    n15 -- n16;
    n14 -- n15;
    n13 -- n14;
    n3 -- n13;
    n2 -- n3;
    n1 -- n2;
}

However, I noticed that you are using this fragment in your code:

        try {
            sgrammar = new FileInputStream("/home/sharaf/antlr4/LatexParser/src/LG.g4");
            sgrammarcontent = FileUtils.getStringFromStream(sgrammar);
        }finally {}

I would suggest to catch the exception and (at least) print its in the catch block. Maybe it couldn't find the .g4. file so that sgrammarcontent is empty but since you don't catch the exception here, the program continues using an empty grammar.

sjiya commented 5 years ago

I am getting the same output like yours. but you can observe in your output that equation elements are not appears in the tree. For example, x, =, -, b, \pm, .. are not in the output.

Plz find the attached code that is generating AST. AST.zip

julianthome commented 5 years ago

Hi @sjiya, oh I see. Yes, per default, Terminals are not included in the parse tree. You basically have two possiblities: 1) Pass true to the constructor of the default tree listener DefaultTreeListener t = new DefaultTreeListener(true); so that terminals are getting included or 2) Use the getEidx, getSidx members of the parse tree nodes (leafs) in order to retrieve the terminal from the string: https://github.com/julianthome/inmemantlr/blob/23d1cb322a3500b286d9a2639efb2c044f2ea48f/inmemantlr-api/src/main/java/org/snt/inmemantlr/tree/ParseTreeNode.java#L240

When you go for option 1, you should get the following output `

graph {
    node [fontname=Helvetica,fontsize=11];
    edge [fontname=Helvetica,fontsize=10];
    n1 [label="root",shape=ellipse];
    n2 [label="math",shape=ellipse];
    n3 [label="relation",shape=ellipse];
    n4 [label="relation",shape=ellipse];
    n5 [label="expr",shape=ellipse];
    n6 [label="additive",shape=ellipse];
    n7 [label="mp",shape=ellipse];
    n8 [label="unary",shape=ellipse];
    n9 [label="postfix",shape=ellipse];
    n10 [label="exp",shape=ellipse];
    n11 [label="comp",shape=ellipse];
    n12 [label="atom",shape=ellipse];
    n13 [label="x",shape=box];
    n12 -- n13;
    n11 -- n12;
    n10 -- n11;
    n9 -- n10;
    n8 -- n9;
    n7 -- n8;
    n6 -- n7;
    n5 -- n6;
    n4 -- n5;
    n3 -- n4;
    n14 [label="=",shape=box];
    n3 -- n14;
    n15 [label="relation",shape=ellipse];
    n16 [label="expr",shape=ellipse];
    n17 [label="additive",shape=ellipse];
    n18 [label="mp",shape=ellipse];
    n19 [label="unary",shape=ellipse];
    n20 [label="postfix",shape=ellipse];
    n21 [label="exp",shape=ellipse];
    n22 [label="comp",shape=ellipse];
    n23 [label="frac",shape=ellipse];
    n24 [label="\frac",shape=box];
    n23 -- n24;
    n25 [label="\{",shape=box];
    n23 -- n25;
    n26 [label="expr",shape=ellipse];
    n27 [label="additive",shape=ellipse];
    n28 [label="mp",shape=ellipse];
    n29 [label="unary",shape=ellipse];
    n30 [label="\-",shape=box];
    n29 -- n30;
    n31 [label="unary",shape=ellipse];
    n32 [label="postfix",shape=ellipse];
    n33 [label="exp",shape=ellipse];
    n34 [label="comp",shape=ellipse];
    n35 [label="atom",shape=ellipse];
    n36 [label="b ",shape=box];
    n35 -- n36;
    n34 -- n35;
    n33 -- n34;
    n32 -- n33;
    n31 -- n32;
    n37 [label="postfix",shape=ellipse];
    n38 [label="exp",shape=ellipse];
    n39 [label="comp",shape=ellipse];
    n40 [label="atom",shape=ellipse];
    n41 [label="\pm",shape=box];
    n40 -- n41;
    n39 -- n40;
    n38 -- n39;
    n37 -- n38;
    n31 -- n37;
    n42 [label="postfix",shape=ellipse];
    n43 [label="exp",shape=ellipse];
    n44 [label="comp",shape=ellipse];
    n45 [label="func",shape=ellipse];
    n46 [label="\sqrt",shape=box];
    n45 -- n46;
    n47 [label="\{",shape=box];
    n45 -- n47;
    n48 [label="expr",shape=ellipse];
    n49 [label="additive",shape=ellipse];
    n50 [label="additive",shape=ellipse];
    n51 [label="mp",shape=ellipse];
    n52 [label="unary",shape=ellipse];
    n53 [label="postfix",shape=ellipse];
    n54 [label="exp",shape=ellipse];
    n55 [label="exp",shape=ellipse];
    n56 [label="comp",shape=ellipse];
    n57 [label="atom",shape=ellipse];
    n58 [label="b",shape=box];
    n57 -- n58;
    n56 -- n57;
    n55 -- n56;
    n54 -- n55;
    n59 [label="\^",shape=box];
    n54 -- n59;
    n60 [label="atom",shape=ellipse];
    n61 [label="2",shape=box];
    n60 -- n61;
    n54 -- n60;
    n53 -- n54;
    n52 -- n53;
    n51 -- n52;
    n50 -- n51;
    n49 -- n50;
    n62 [label="\-",shape=box];
    n49 -- n62;
    n63 [label="additive",shape=ellipse];
    n64 [label="mp",shape=ellipse];
    n65 [label="unary",shape=ellipse];
    n66 [label="postfix",shape=ellipse];
    n67 [label="exp",shape=ellipse];
    n68 [label="comp",shape=ellipse];
    n69 [label="atom",shape=ellipse];
    n70 [label="4ac",shape=box];
    n69 -- n70;
    n68 -- n69;
    n67 -- n68;
    n66 -- n67;
    n65 -- n66;
    n64 -- n65;
    n63 -- n64;
    n49 -- n63;
    n48 -- n49;
    n45 -- n48;
    n71 [label="\}",shape=box];
    n45 -- n71;
    n44 -- n45;
    n43 -- n44;
    n42 -- n43;
    n31 -- n42;
    n29 -- n31;
    n28 -- n29;
    n27 -- n28;
    n26 -- n27;
    n23 -- n26;
    n72 [label="\}",shape=box];
    n23 -- n72;
    n73 [label="\{",shape=box];
    n23 -- n73;
    n74 [label="expr",shape=ellipse];
    n75 [label="additive",shape=ellipse];
    n76 [label="mp",shape=ellipse];
    n77 [label="unary",shape=ellipse];
    n78 [label="postfix",shape=ellipse];
    n79 [label="exp",shape=ellipse];
    n80 [label="comp",shape=ellipse];
    n81 [label="atom",shape=ellipse];
    n82 [label="2a",shape=box];
    n81 -- n82;
    n80 -- n81;
    n79 -- n80;
    n78 -- n79;
    n77 -- n78;
    n76 -- n77;
    n75 -- n76;
    n74 -- n75;
    n23 -- n74;
    n83 [label="\}",shape=box];
    n23 -- n83;
    n22 -- n23;
    n21 -- n22;
    n20 -- n21;
    n19 -- n20;
    n18 -- n19;
    n17 -- n18;
    n16 -- n17;
    n15 -- n16;
    n3 -- n15;
    n2 -- n3;
    n1 -- n2;
}
sjiya commented 5 years ago

Thanks julian, option 1 is working fine and its getting result with equation elements. but I am bit confuse about option 2. how do I get equation element without chain of rules?? for example, I don't want following rules from n1-n12 in tree because the are use less in my case.

n1 [label="root",shape=ellipse]; n2 [label="math",shape=ellipse]; n3 [label="relation",shape=ellipse]; n4 [label="relation",shape=ellipse]; n5 [label="expr",shape=ellipse]; n6 [label="additive",shape=ellipse]; n7 [label="mp",shape=ellipse]; n8 [label="unary",shape=ellipse]; n9 [label="postfix",shape=ellipse]; n10 [label="exp",shape=ellipse]; n11 [label="comp",shape=ellipse]; n12 [label="atom",shape=ellipse]; n13 [label="x",shape=box];

julianthome commented 5 years ago

The second option is interesting if you just want to extract the terminals from the input string. As an example you could do this with:

String quad = "x=\\frac{-b \\pm \\sqrt{b^2-4ac}}{2a}";
// ...
DefaultTreeListener t = new DefaultTreeListener();
// ...
cast.getLeafs().stream().forEach(
  l -> System.out.println("\"" + quad.substring(l.getSidx(), l.getEidx()+1) + "\"")
);

This works because the leafs are always directly linked to a matching token. In case to simplify your parse tree a bit by filtering certain rule, you can actually pass a filter to the constructor of the DefaultTreeListener (https://github.com/julianthome/inmemantlr/blob/23d1cb322a3500b286d9a2639efb2c044f2ea48f/inmemantlr-api/src/main/java/org/snt/inmemantlr/listener/DefaultTreeListener.java#L81). Only the nodes that match the filter rule you provide, will be added to the parse tree. I think you could pass a filter that excludes the rules you've mentioned in your last post. This is described in the section Parse Tree Pruning in the documentation.

sjiya commented 5 years ago

Dear Julian, Thank you for solving various issues. now everything is working great.😃

On Mon, Feb 11, 2019 at 9:59 PM Julian Thomé notifications@github.com wrote:

The second option is interesting if you just want to extract the terminals from the string. As an example you could do this with:

String quad = "x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}";// ...DefaultTreeListener t = new DefaultTreeListener();// ... cast.getLeafs().stream().forEach( l -> System.out.println("\"" + quad.substring(l.getSidx(), l.getEidx()+1) + "\"") );

This works because the leafs are always directly linked to a matching token. In case to simplify your parse tree a bit by filtering certain rule, you can actually pass a filter to the constructor of the DefaultTreeListener ( https://github.com/julianthome/inmemantlr/blob/23d1cb322a3500b286d9a2639efb2c044f2ea48f/inmemantlr-api/src/main/java/org/snt/inmemantlr/listener/DefaultTreeListener.java#L81). Only the nodes that match the filter rule you provide, will be added to the parse tree. I think you could pass a filter that excludes the rules you've mentioned in your last post.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/julianthome/inmemantlr/issues/35#issuecomment-462406210, or mute the thread https://github.com/notifications/unsubscribe-auth/ATUL2jRnDqAMS1lbwfGY793dZhGVmp_Tks5vMaF5gaJpZM4XTxF2 .