Closed arnaud-ma closed 2 weeks ago
Hi @arnaud-ma,
Thank you so much for your PR! We really like your ast
approach to resolving the multiline assignment issue.
After studying this PR, we realized that some of the ideas in this PR could be applied with fewer changes. This is implemented in: https://github.com/swansonk14/typed-argument-parser/pull/148.
Best, Jesse and Kyle
Hi, thanks for the update! The idea behind this PR was to completely replace the use of tokenize
with ast
. This solves all possible bugs with code parsing. I also think (to test) that there should be a speed up in terms of performance. Moreover in the new PR, the source code is read twice (with inspect.getsource
): once for tokenize
and once for ast
. And it is this reading that is (by far) the most expensive in performance. But I understand that what i proposed was too much change at once
Oh, very interesting! I'd be interested in knowing whether AST resolves: https://github.com/swansonk14/typed-argument-parser/issues/45. Either you/(Kyle and I) can compare the performance of code using only ast
to the current code. If it resolves performance issues, then I think that probably justifies a larger code change. Sorry we missed this aspect of your PR. Thanks again for the contribution.
Given this code where NewTap
is from my PR and Tap
is from before #148:
import timeit
from tap import Tap
from tap.tap import NewTap
from tap.utils import _old_get_class_variables, get_class_variables
class SimpleArgumentParser(Tap):
name: str # Your name
language: str = "Python" # Programming language
package: str = "Tap" # Package name
stars: int # Number of stars
max_stars: int = 5 # Maximum stars
class NewSimpleArgumentParser(NewTap):
name: str # Your name
language: str = "Python" # Programming language
package: str = "Tap" # Package name
stars: int # Number of stars
max_stars: int = 5 # Maximum stars
if __name__ == "__main__":
number = 1000
number_cls = 100
t_new_cls = timeit.timeit(
"NewSimpleArgumentParser()",
globals=globals(),
number=number_cls,
)
t_old_cls = timeit.timeit(
"SimpleArgumentParser()",
globals=globals(),
number=number_cls,
)
t_new = timeit.timeit(
"get_class_variables(SimpleArgumentParser)",
globals=globals(),
number=number,
)
t_old = timeit.timeit(
"_old_get_class_variables(SimpleArgumentParser)",
globals=globals(),
number=number,
)
increase = (t_old - t_new) / t_old
increase_cls = (t_old_cls - t_new_cls) / t_old_cls
print(f"time new get_class_variables: {t_new / number * 1000:.4f} ms per call")
print(f"time old get_class_variables: {t_old / number_cls * 1000:.4f} ms per call")
print(f"new is {increase * 100:.2f}% faster")
print(f"time new class init: {t_new_cls / number * 1000:.4f} ms per call")
print(f"time old class init: {t_old_cls / number_cls * 1000:.4f} ms per call")
print(f"new is {increase_cls * 100:.2f}% faster")
I get:
time new get_class_variables: 0.3262 ms per call
time old get_class_variables: 0.5686 ms per call
new is 42.63% faster
time new class init: 17.0593 ms per call
time old class init: 21.7422 ms per call
new is 21.54% faster
It's better, but it won't solve #45 since it is inspect.getsource
that takes the majority of the remaining time
I made a specific PR (#149) related to performance, you can ignore this one
Perfect! Thank you for the update! Kyle and I will discuss it next time we meet. I approved running the tests.
Instead of parsing by hand token by token, ast returns a python object that just needs to be read. We can still make mistakes when reading this object, but no more inevitable edge cases from parsing the source code.
Closes #97 Closes #130