Design what to do in DSL to WDL stage

uchicago-cs / chiventure

A text adventure game engine developed in UChicago's CMSC 22000 - Introduction to Software Development

40 stars 13 forks source link

Design what to do in DSL to WDL stage #940

Closed Mich0608 closed 3 years ago

Mich0608 commented 3 years ago

We will design the stage of extracting information from DSL and loading that information into an internal Python representation before being converted to WDL. The design process will involve discussing how to represent DSL information with Python data structures.

Mich0608 commented 3 years ago

We had a team meeting and we decided that this task would involve designing skeleton functions that will interpret the output of lark and load them into python data structures that we can, in later stages, manipulate into WDL.

After reading through this tutorial on how to write and interpret a DSL with Lark, I have learned the following:

Lark takes a EBNF grammar file and outputs a syntax tree that represents the structures of the rules and terminals(tokens)
We can access the parent and children nodes of the tree "t" with t.data and t.children

The above information should be useful when determining how we are going to access the parsed data and load them into corresponding data structures.

Mich0608 commented 3 years ago

We had a meeting where we discussed Jack and Tessie's implementation (commit 3e9aca0) that parses a DSL file into something that looks like the WDL format. In this sandbox implementation of the parser, methods corresponding to items, rooms, games, connections, etc. are contained inside a TreeToDict class. These functions will be called correspondingly when lark transforms the syntax tree and loads the information in the tree into Python data structures. Then, json.dumps() is called to transform the Python data structures into a JSON format.

This design is modeled after this tutorial on the Lark documentation. We decided that it was better to use this design rather than to recursively access data in the tree via t.data and t.children as mentioned above, because this design is more friendly for the object-heavy format of JSON/WDL. Since we're trying to format rather than manipulate the objects in the tree, a design that simply transforms corresponding branches in the tree rather than recursively first retrieve then transform the values in the branches is better. Plus, recursive functions would not fit our objective, since we do not necessarily want to represent WDL objects in the order that they are written in with DSL.

However, there are still major differences between the sandbox output and the desired WDL format. In the next stage of the sprint, John and I will work on designing modifications to the sandbox functions so that they may output the correct WDL format.

Mich0608 commented 3 years ago

Right now, a major difference between the output of the sandbox implementation (mentioned above) and the desired WDL format is that the sandbox output nests items under rooms as such:

"ROOMS": {
    "room A": {
      "short desc": "A dungeon room.",
      "long desc": "You shudder to think of the unspeakable horrors that have taken place in these dungeons. You wouldn't want to be fly on the wall here, but mostly because of how damp and moldy the walls are.",
      "connections": {
        "SOUTH": "room B"
      },
      "items": {
        "sconce": {
          "short desc": "A sconce holding a candle",
          "long desc": "It looks a bit loose."
        }
      }
    }

Whereas in the desired WDL JSON format, items are not nested under rooms (which would be its own object with nested rooms), but nested within an array under a larger "ITEMS" object as such:

{
  "ITEMS": [
    {
      "state": "still", 
      "long_desc": "This is a chair long", 
      "value": "no", 
      "in": "room A", 
      "short_desc": "This is a chair", 
      "id": "CHAIR", 
      "actions": [
        ...
      ]
    }, 
    {
      "state": "solid", 
      "long_desc": "This is a table long", 
      "value": "no", 
      "in": "room B", 
      "short_desc": "This is a table", 
      "id": "TABLE", 
      "actions": [
        ...
      ]
    }
  ],

I talked to Borja after class today and we decided that while the class-method way

This design is modeled after this tutorial on the Lark documentation. We decided that it was better to use this design rather than to recursively access data in the tree via t.data and t.children as mentioned above, because this design is more friendly for the object-heavy format of JSON/WDL. Since we're trying to format rather than manipulate the objects in the tree, a design that simply transforms corresponding branches in the tree rather than recursively first retrieve then transform the values in the branches is better.

of manipulating the syntax tree is relatively straightforward, recursing through the syntax tree may be a better and more flexible way of loading data from the tree into python data structures that can later be transformed into valid WDL. That way, our output will not be confined by the order of branches in the tree.

Mich0608 commented 3 years ago

In the above commit 5ec274b6d4ca84d9f1a4dcbb775242de6cb49dd8, I have wrote some basic code that uses the Visitor pattern in Lark. The draft is not finished, but rather provide a possible/experimental way of implementing the parser as I try to familiarize myself with Lark features. The Visitor pattern, as described here, is a built-in Lark feature to traverse the syntax tree. Unlike the Transformer pattern, it does not automatically replace branches as the methods are called.

The current design has a TreeToDict class that uses the Visitor pattern to visit the branches of the syntax tree. Depending on which branch it encounters, the visitor will call on methods that may or may not utilize helper functions to store the data in the syntax tree in python dictionaries and lists. These dictionaries and lists may represent rooms, the game, connections, etc. They are then nested within each other and put into a larger game-wide dictionary.

Basically, the design traverses the tree and stores the information in intermediate data structures, which json.dumps() transforms into the appropriate JSON syntax for WDL.

Mich0608 commented 3 years ago

After our team meeting today, we decided that it's better to use the Transformer pattern rather than the Visitor pattern (documentation for both can be found here). The transformer can do everything the visitor can do, but it replaces the branches of the tree with the return values of each method.

The decision to use Transformer is to make the code more intuitive and readable. Currently, the implementation of visitor (commit 85bdacb0a4858626815e22718da8c8f804cf0ffb) only has two methods, game and room, and accesses connections, items, etc. under the room method via if statements and helper functions. This design is to keep track of the properties, connections, and items in a given room. However, the design can can become cumbersome as chiventure advances to include more and more things in rooms. Therefore, we decided that using Transformer is a better option because it reconstructs the tree with intermediate Python data structures but still retains the order of the branches. Since we know the order of the branches, we can later go through the data structures again and extract the items that are sub-branches to certain rooms and load them into a separate item dictionary. The idea is to have a design as such:

feed the syntax tree to the transformer
the transformer replaces the branches of the tree with intermediate Python data structures

Next, we have functions that extract information from the data structures in the reconstructed tree:

function to group all free-floating property:value pairs (#981) into "free-floating" data structures
function to extract items from rooms and put items in separate dictionary
function to place all free-floating properties in their respective place

Mich0608 commented 3 years ago

I wrote some skeleton functions based on the design outlined above and the current implementation of the sandbox Transformer parser (commit 137e486b76b00269ef2ba919b7d8a2d2b764f4d3):

the transformer class:

Class TreeToDict(Transformer):

we have several objects of the form ('ROOM', (<room id>, <room properties>)) and we want to group all rooms into their own dict of the form {<room properties>}:

    def game(self, s)

replace room branch with dictionary of properties we have several objects of the form ('ITEM', item dict) and we want to group all items into their own list:

    def room(self, s)

replace connections with a dictionary that would automatically contain connection tuples

    def connection(self, s)

replace item branch with dictionary:

    def item(self, s)

input is of the form [('id':value),<properties>] output is of the form ("actions”, <properties>):

    def action(self, s)

replace escaped characters with unicode characters:

    def ESCAPED_STRING(self, s)

replace single property with tuple if property is specified to belong to a specific item or room, load into dictionary of the form {<item/room id>: (property, value)}:

    def property(self, s)

replace start branch with tuple (start, <room id>):

    def start_g(self, s)

replace end branch with tuple (end, <room id>):

    def end_g(self, s)

replace id branch with tuple (id, <room id>):

    def id(self, s)

replace location branch with tuple (location, <room id>):

    def location(self, s)

replace phrase branch with the phrase as a joined string:

    def phrase(self, s)

replace single connection with tuple:

    connection = tuple

reform the tree to fit the WDL syntax, calls helper functions below input: syntax tree transformed by TreeToDict Transformer:

def reform_tree(transformed_tree)

load free-floating item/room property-value pair into an intermediate free-floating properties dictionary input: single property-value dictionary of the form {<item/room id>: (property, value)}:

def load_free_property(property)

extract item from room and load item in separate ITEMS dictionary input: a list of items:

def extract_items(transformed_tree)

place free-floating properties into their appropriate place input:

a dictionary of free-floating property-value pairs

syntax tree transformed by TreeToDict Transformer:

def place_free_properties(free_properties, transformed_tree)

john-hahn commented 3 years ago

Using the skeleton design that Michelle wrote about above, it can be more easily visualized with all the functions next to each other, and I updated the function headers to a more refined design by referencing the current implementation of the Transformer sandbox.

import sys
from lark import Lark, Transformer
from lark.lexer import Token
import json
from vars_parser import evalVars

grammar_f = open("dsl_grammar.lark")
dsl_grammar = grammar_f.read()
grammar_f.close()

Class TreeToDict(Transformer):
    def game(self, s):
        return NULL

    def room(self, s):
        return NULL

    def connection(self, s: list[tuple[str, str]]) -> tuple[str, dict]:
        return ("connections", dict(s))

    def item(self, s: list) -> tuple[str, tuple[str, dict]]:
        return ('ITEM', dict(s))

    def action(self, s):
        return NULL

    def ESCAPED_STRING(self, s):
        return NULL

    def property(self, s):
        return NULL

    def start_g(self, s: list[str]) -> tuple[str, str]:
        return ("start", s[0])

    def end_g(self, s: list[str]) -> tuple[str, str]:
        return ("end", s[0])

    def id(self, s: list[str]) -> tuple[str, str]:
        return ("id", s[0])

    def location(self, s: list[str]) -> tuple[str, str]:
        return ("location", s[0])

    def phrase(self, s: list[Token]) -> str:
        return ' '.join(s)

    # replaces both a single connection and a property with a tuple
    connection = tuple
    property = tuple

# reform the tree to fit the WDL syntax, calls helper functions below
def reform_tree(transformed_tree):
    return NULL

# load free-floating item/room property-value pair into an intermediate free-floating properties dictionary
def load_free_property(property):
    return NULL

# extract item from room and load item in separate ITEMS dictionary
def extract_items(transformed_tree):
    return NULL

# place free-floating properties into their appropriate place
def place_free_properties(free_properties, transformed_tree):
    return NULL

The simple functions have content in their function, while the more complex ones are just the skeleton with a generic 'return NULL' statement. This design is very similar to the current implementation with the exception of a few functions specific to the actions using the transformed tree of basic features. A more detailed description of the functions is found in Michelle's explanation above.

john-hahn commented 3 years ago

Michelle and I met for the last time today and confirmed the design to be that of the comment above. The headers of the functions were slightly altered to accommodate the input and output specification. For example, def start_g(self, s) to def start_g(self, s: list[str]) -> tuple[str, str]:. Apart from that, we both agreed on the design to have those functions mentioned above in the TreetoDict() class and to have four functions that manipulate the transformed tree.

The implementation will come after this, building around the skeleton design. Documentation of these functions will be done as it is implemented, in case there are any slight changes to the design throughout the process. Lastly, a function that loads up the JSON file to the directory should be included once the intermediate stage is fully implemented.

ecoble commented 3 years ago

Issue Score: ✔️++

Comments: Great work on this issue! The documentation created is excellent, as well as the status updates! Be sure to associate issues with their proper milestone so they can be picked up by the grading script.