How is the target vector (program rules sequence) in TreeGen created during training?

Hi Authors,

My understaning is that TreeGen learns by predicting the rules from the target program and then computing the CE loss with the ground truth grammar rules (as a sequence). Thus, I assume you make the target program into an AST and in that parsing process you get a sequence (that is padded) indicating which rule was used. In particular to do that you need to decide on an ordering for the rules. Did you use DFS, BFS or something else for that to create the actual target rule sequence the model is going to learn from? Since there is not a unique way to create this label my assumption is that the model is "biased" to output, say, BFS generated programs. Is this correct? Where is the code that does that?

Thanks for your time!

zysszy / TreeGen

How is the target vector (program rules sequence) in TreeGen created during training? #14