Open rohankalbag opened 2 months ago
@microsoft-github-policy-service agree
Thanks for the PR on this - I'm not so sure that I think that regular expressions are the right approach on this one, since they can have unrelated modifications in source code. Nor do I feel entirely comfortable exec
ing the source text produced by that kind of text replacement.
The idea of replacing the source text and re-exec-ing is one I hadnt thought of - I don't know if it's the right one in general though.
When I was originally thinking about how this could work, the idea I had was to use the ast module to accurately identify the appropriate attribute, and the tokenize modules to grab the subsequent comment (though tokenize
might be overkill). If you're still open to tackling the problem, I would start there. @gvanrossum might have thoughts here too.
Thanks for your valuable suggestions @DanielRosenwasser. Have implemented a more robust _convert_pythonic_comments_to_annotated_docs
to tackle the problem.
This implementation now utilizes ast
to firstly, parse the source of the schema, and then traverses the AST to identify ast.AnnAssign
(s) for each class, then extracts their corresponding start and end lines, after which the comment is extracted using tokenize
and modifies the AST to incorporate Annotated[Doc()]
wherever needed, then obtains the transformed source and re-exec
s it like earlier. The implementation also handles situations when Docs
and Annotated
have not been imported in the original schema
Have tested it on the example used earlier through another implementation script ast_comment_handling.py
showing POC. If the debug flag is set to True, the schema file before and after processing can be seen
Again, for the schema_with_comments.py
example, the following debug output can be obtained by running ast_comment_handling.py
Source code before transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Callable
class MathAPI(TypedDict):
"""
This is API for a simple calculator
"""
# this is a comment
add: Callable[[float, float], float] # Add two numbers
sub: Callable[[float, float], float] # Subtract two numbers
mul: Callable[[float, float], float] # Multiply two numbers
div: Callable[[float, float], float] # Divide two numbers
neg: Callable[[float], float] # Negate a number
id: Callable[[float], float] # Identity function
unknown: Callable[
[str], float
] # Unknown request
----------------------------------------------------------------------------------------------------
Source code after transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import TypedDict, Callable, Annotated, Doc
class MathAPI(TypedDict):
"""
This is API for a simple calculator
"""
add: Annotated[Callable[[float, float], float], Doc('Add two numbers')]
sub: Annotated[Callable[[float, float], float], Doc('Subtract two numbers')]
mul: Annotated[Callable[[float, float], float], Doc('Multiply two numbers')]
div: Annotated[Callable[[float, float], float], Doc('Divide two numbers')]
neg: Annotated[Callable[[float], float], Doc('Negate a number')]
id: Annotated[Callable[[float], float], Doc('Identity function')]
unknown: Annotated[Callable[[str], float], Doc('Unknown request')]
----------------------------------------------------------------------------------------------------
@gvanrossum
I'm sorry, but in the end I feel that this approach is too brittle to build into TypeChat itself. Maybe it could be turned into a CLI tool that does a one-time conversion of a schema with comments to a schema with Annotated...Doc...
Thanks @gvanrossum for your inputs. As suggested by you, I have separated the source code transformation from the TypeChat src. Instead a utility for the one-time-conversion exposing a user friendly CLI has been created in python/utils/python_comment_handler
which is a script python_comment_handler.py
whose usage is described below:
usage: python_comment_handler.py [-h] --in_path IN_PATH --out_path OUT_PATH [--debug]
options:
-h, --help show this help message and exit
--in_path IN_PATH, -i IN_PATH
Path to the schema file containing pythonic comments
--out_path OUT_PATH, -o OUT_PATH
Path to the output file containing the transformed schema
--debug, -d Print debug information
In addition to this, I noticed that previously I hadn't handled the fields being enclosed by Required/NotRequired
( Annotated[Required[...], Doc(...)]
instead of the expected Required[Annotated[..., Doc(...)]]
). This has been addressed in the current implementation by adding a conditional check to determine whether the ast.Subscript
node is either Required/NotRequired
, in that case the transformation is carried out to the child of the ast.Subscript
node.
Few python field commented schema examples have been added in python/utils/python_comment_handler/examples
. An example usage of the utility can seen below, whose debug output (obtained by passing -d
flag) can be seen below:
python3 python_comment_handler.py -i examples/commented_restaurant_schema.py -o transformed_schema.py -d
Source code before transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import Literal, Required, NotRequired, TypedDict
class UnknownText(TypedDict):
"""
Use this type for order items that match nothing else
"""
itemType: Literal["Unknown"]
text: str # The text that wasn't understood
class Pizza(TypedDict, total=False):
itemType: Required[Literal["Pizza"]]
size: Literal["small", "medium", "large", "extra large"] # default: large
addedToppings: list[str] # toppings requested (examples: pepperoni, arugula
removedToppings: list[str] # toppings requested to be removed (examples: fresh garlic, anchovies
quantity: int # default: 1
name: Literal["Hawaiian", "Yeti", "Pig In a Forest", "Cherry Bomb"] # used if the requester references a pizza by name
class Beer(TypedDict):
itemType: Literal["Beer"]
kind: str # examples: Mack and Jacks, Sierra Nevada Pale Ale, Miller Lite
quantity: NotRequired[int] # default: 1
SaladSize = Literal["half", "whole"]
SaladStyle = Literal["Garden", "Greek"]
class Salad(TypedDict, total=False):
itemType: Required[Literal["Salad"]]
portion: str # default: half
style: str # default: Garden
addedIngredients: list[str] # ingredients requested (examples: parmesan, croutons)
removedIngredients: list[str] # ingredients requested to be removed (example: red onions)
quantity: int # default: 1
OrderItem = Pizza | Beer | Salad
class Order(TypedDict):
items: list[OrderItem | UnknownText]
----------------------------------------------------------------------------------------------------
Source code after transformation:
----------------------------------------------------------------------------------------------------
from typing_extensions import Literal, Required, NotRequired, TypedDict, Annotated, Doc
class UnknownText(TypedDict):
"""
Use this type for order items that match nothing else
"""
itemType: Literal['Unknown']
text: Annotated[str, Doc("The text that wasn't understood")]
class Pizza(TypedDict, total=False):
itemType: Required[Literal['Pizza']]
size: Annotated[Literal['small', 'medium', 'large', 'extra large'], Doc('default: large')]
addedToppings: Annotated[list[str], Doc('toppings requested (examples: pepperoni, arugula')]
removedToppings: Annotated[list[str], Doc('toppings requested to be removed (examples: fresh garlic, anchovies')]
quantity: Annotated[int, Doc('default: 1')]
name: Annotated[Literal['Hawaiian', 'Yeti', 'Pig In a Forest', 'Cherry Bomb'], Doc('used if the requester references a pizza by name')]
class Beer(TypedDict):
itemType: Literal['Beer']
kind: Annotated[str, Doc('examples: Mack and Jacks, Sierra Nevada Pale Ale, Miller Lite')]
quantity: NotRequired[Annotated[int, Doc('default: 1')]]
SaladSize = Literal['half', 'whole']
SaladStyle = Literal['Garden', 'Greek']
class Salad(TypedDict, total=False):
itemType: Required[Literal['Salad']]
portion: Annotated[str, Doc('default: half')]
style: Annotated[str, Doc('default: Garden')]
addedIngredients: Annotated[list[str], Doc('ingredients requested (examples: parmesan, croutons)')]
removedIngredients: Annotated[list[str], Doc('ingredients requested to be removed (example: red onions)')]
quantity: Annotated[int, Doc('default: 1')]
OrderItem = Pizza | Beer | Salad
class Order(TypedDict):
items: list[OrderItem | UnknownText]
----------------------------------------------------------------------------------------------------
@DanielRosenwasser
Attempt at resolving #242, added feature to utilize Python comment syntax instead of currently supported verbose
Annotated[Doc()]
-based comment style. The feature is currently implemented forexamples/math
but can be extended to other examples as the new_convert_pythonic_comments_to_annotated_docs
method is added totypechat.TypeChatJsonTranslator
.The feature as suggested, attempts to scan the source code using regular expressions and inserts
Annotated[Doc()]
for the corresponding commented schema field (_convert_pythonic_comments_to_annotated_docs
method), the original pythonic comments are maintained since they are harmless, but can be removed further if needed.Also added a simple example schema in
examples/math
(schema_with_comments.py
) and a simple implementation scriptpythonic_comment_handling.py
showing POC. If thedebug
flag is set to True, the schema file before and after processing can be seenFor example for the
schema_with_comments.py
example, the following debug output can be obtained by runningpythonic_comment_handling.py
@gvanrossum @DanielRosenwasser