Added feature to utilize Python syntax for field comments in schema

rohankalbag commented 2 months ago

Attempt at resolving #242, added feature to utilize Python comment syntax instead of currently supported verbose Annotated[Doc()]-based comment style. The feature is currently implemented for examples/math but can be extended to other examples as the new _convert_pythonic_comments_to_annotated_docs method is added to typechat.TypeChatJsonTranslator.

The feature as suggested, attempts to scan the source code using regular expressions and inserts Annotated[Doc()] for the corresponding commented schema field (_convert_pythonic_comments_to_annotated_docs method), the original pythonic comments are maintained since they are harmless, but can be removed further if needed.

Also added a simple example schema in examples/math (schema_with_comments.py) and a simple implementation script pythonic_comment_handling.py showing POC. If the debug flag is set to True, the schema file before and after processing can be seen

For example for the schema_with_comments.py example, the following debug output can be obtained by running pythonic_comment_handling.py

File contents before modification:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Annotated, Callable, Doc

class MathAPI(TypedDict):
    """
    This is API for a simple calculator
    """

    # this is a comment

    add: Callable[[float, float], float] # Add two numbers
    sub: Callable[[float, float], float] # Subtract two numbers
    mul: Callable[[float, float], float] # Multiply two numbers
    div: Callable[[float, float], float] # Divide two numbers
    neg: Callable[[float], float] # Negate a number
    id: Callable[[float], float] # Identity function
    unknown: Callable[[str], float] # Unknown request
----------------------------------------------------------------------------------------------------
File contents after modification:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Annotated, Callable, Doc

class MathAPI(TypedDict):
    """
    This is API for a simple calculator
    """

    # this is a comment

    add: Annotated[Callable[[float, float], float], Doc("Add two numbers")] # Add two numbers
    sub: Annotated[Callable[[float, float], float], Doc("Subtract two numbers")] # Subtract two numbers
    mul: Annotated[Callable[[float, float], float], Doc("Multiply two numbers")] # Multiply two numbers
    div: Annotated[Callable[[float, float], float], Doc("Divide two numbers")] # Divide two numbers
    neg: Annotated[Callable[[float], float], Doc("Negate a number")] # Negate a number
    id: Annotated[Callable[[float], float], Doc("Identity function")] # Identity function
    unknown: Annotated[Callable[[str], float], Doc("Unknown request")] # Unknown request
----------------------------------------------------------------------------------------------------

@gvanrossum @DanielRosenwasser

rohankalbag commented 2 months ago

@microsoft-github-policy-service agree

DanielRosenwasser commented 2 months ago

Thanks for the PR on this - I'm not so sure that I think that regular expressions are the right approach on this one, since they can have unrelated modifications in source code. Nor do I feel entirely comfortable execing the source text produced by that kind of text replacement.

The idea of replacing the source text and re-exec-ing is one I hadnt thought of - I don't know if it's the right one in general though.

When I was originally thinking about how this could work, the idea I had was to use the ast module to accurately identify the appropriate attribute, and the tokenize modules to grab the subsequent comment (though tokenize might be overkill). If you're still open to tackling the problem, I would start there. @gvanrossum might have thoughts here too.

rohankalbag commented 2 months ago

Thanks for your valuable suggestions @DanielRosenwasser. Have implemented a more robust _convert_pythonic_comments_to_annotated_docs to tackle the problem.

This implementation now utilizes ast to firstly, parse the source of the schema, and then traverses the AST to identify ast.AnnAssign(s) for each class, then extracts their corresponding start and end lines, after which the comment is extracted using tokenize and modifies the AST to incorporate Annotated[Doc()] wherever needed, then obtains the transformed source and re-execs it like earlier. The implementation also handles situations when Docs and Annotated have not been imported in the original schema

Have tested it on the example used earlier through another implementation script ast_comment_handling.py showing POC. If the debug flag is set to True, the schema file before and after processing can be seen

Again, for the schema_with_comments.py example, the following debug output can be obtained by running ast_comment_handling.py

Source code before transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Callable

class MathAPI(TypedDict):
    """
    This is API for a simple calculator
    """

    # this is a comment

    add: Callable[[float, float], float] # Add two numbers
    sub: Callable[[float, float], float] # Subtract two numbers
    mul: Callable[[float, float], float] # Multiply two numbers
    div: Callable[[float, float], float] # Divide two numbers
    neg: Callable[[float], float] # Negate a number
    id: Callable[[float], float] # Identity function
    unknown: Callable[
        [str], float
    ] # Unknown request
----------------------------------------------------------------------------------------------------
Source code after transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Callable, Annotated, Doc

class MathAPI(TypedDict):
    """
    This is API for a simple calculator
    """
    add: Annotated[Callable[[float, float], float], Doc('Add two numbers')]
    sub: Annotated[Callable[[float, float], float], Doc('Subtract two numbers')]
    mul: Annotated[Callable[[float, float], float], Doc('Multiply two numbers')]
    div: Annotated[Callable[[float, float], float], Doc('Divide two numbers')]
    neg: Annotated[Callable[[float], float], Doc('Negate a number')]
    id: Annotated[Callable[[float], float], Doc('Identity function')]
    unknown: Annotated[Callable[[str], float], Doc('Unknown request')]
----------------------------------------------------------------------------------------------------

@gvanrossum

gvanrossum commented 2 months ago

I'm sorry, but in the end I feel that this approach is too brittle to build into TypeChat itself. Maybe it could be turned into a CLI tool that does a one-time conversion of a schema with comments to a schema with Annotated...Doc...

rohankalbag commented 2 months ago

Thanks @gvanrossum for your inputs. As suggested by you, I have separated the source code transformation from the TypeChat src. Instead a utility for the one-time-conversion exposing a user friendly CLI has been created in python/utils/python_comment_handler which is a script python_comment_handler.py whose usage is described below:

usage: python_comment_handler.py [-h] --in_path IN_PATH --out_path OUT_PATH [--debug]

options:
  -h, --help            show this help message and exit
  --in_path IN_PATH, -i IN_PATH
                        Path to the schema file containing pythonic comments
  --out_path OUT_PATH, -o OUT_PATH
                        Path to the output file containing the transformed schema
  --debug, -d           Print debug information

In addition to this, I noticed that previously I hadn't handled the fields being enclosed by Required/NotRequired ( Annotated[Required[...], Doc(...)] instead of the expected Required[Annotated[..., Doc(...)]]). This has been addressed in the current implementation by adding a conditional check to determine whether the ast.Subscript node is either Required/NotRequired, in that case the transformation is carried out to the child of the ast.Subscript node.

Few python field commented schema examples have been added in python/utils/python_comment_handler/examples. An example usage of the utility can seen below, whose debug output (obtained by passing -d flag) can be seen below:

python3 python_comment_handler.py -i examples/commented_restaurant_schema.py -o transformed_schema.py -d

Source code before transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import Literal, Required, NotRequired, TypedDict

class UnknownText(TypedDict):
    """
    Use this type for order items that match nothing else
    """

    itemType: Literal["Unknown"]
    text: str # The text that wasn't understood

class Pizza(TypedDict, total=False):
    itemType: Required[Literal["Pizza"]]
    size: Literal["small", "medium", "large", "extra large"] # default: large
    addedToppings: list[str] # toppings requested (examples: pepperoni, arugula
    removedToppings: list[str] # toppings requested to be removed (examples: fresh garlic, anchovies
    quantity: int # default: 1
    name: Literal["Hawaiian", "Yeti", "Pig In a Forest", "Cherry Bomb"] # used if the requester references a pizza by name

class Beer(TypedDict):
    itemType: Literal["Beer"]
    kind: str # examples: Mack and Jacks, Sierra Nevada Pale Ale, Miller Lite
    quantity: NotRequired[int] # default: 1

SaladSize = Literal["half", "whole"]

SaladStyle = Literal["Garden", "Greek"]

class Salad(TypedDict, total=False):
    itemType: Required[Literal["Salad"]]
    portion: str # default: half
    style: str # default: Garden
    addedIngredients: list[str] # ingredients requested (examples: parmesan, croutons)
    removedIngredients: list[str] # ingredients requested to be removed (example: red onions)
    quantity: int # default: 1

OrderItem = Pizza | Beer | Salad

class Order(TypedDict):
    items: list[OrderItem | UnknownText]

----------------------------------------------------------------------------------------------------
Source code after transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import Literal, Required, NotRequired, TypedDict, Annotated, Doc

class UnknownText(TypedDict):
    """
    Use this type for order items that match nothing else
    """
    itemType: Literal['Unknown']
    text: Annotated[str, Doc("The text that wasn't understood")]

class Pizza(TypedDict, total=False):
    itemType: Required[Literal['Pizza']]
    size: Annotated[Literal['small', 'medium', 'large', 'extra large'], Doc('default: large')]
    addedToppings: Annotated[list[str], Doc('toppings requested (examples: pepperoni, arugula')]
    removedToppings: Annotated[list[str], Doc('toppings requested to be removed (examples: fresh garlic, anchovies')]
    quantity: Annotated[int, Doc('default: 1')]
    name: Annotated[Literal['Hawaiian', 'Yeti', 'Pig In a Forest', 'Cherry Bomb'], Doc('used if the requester references a pizza by name')]

class Beer(TypedDict):
    itemType: Literal['Beer']
    kind: Annotated[str, Doc('examples: Mack and Jacks, Sierra Nevada Pale Ale, Miller Lite')]
    quantity: NotRequired[Annotated[int, Doc('default: 1')]]
SaladSize = Literal['half', 'whole']
SaladStyle = Literal['Garden', 'Greek']

class Salad(TypedDict, total=False):
    itemType: Required[Literal['Salad']]
    portion: Annotated[str, Doc('default: half')]
    style: Annotated[str, Doc('default: Garden')]
    addedIngredients: Annotated[list[str], Doc('ingredients requested (examples: parmesan, croutons)')]
    removedIngredients: Annotated[list[str], Doc('ingredients requested to be removed (example: red onions)')]
    quantity: Annotated[int, Doc('default: 1')]
OrderItem = Pizza | Beer | Salad

class Order(TypedDict):
    items: list[OrderItem | UnknownText]
----------------------------------------------------------------------------------------------------

@DanielRosenwasser

microsoft / TypeChat

Added feature to utilize Python syntax for field comments in schema #245