pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[BUG] Non-unicode code causes torch.compile to fail #927

Closed gmmyung closed 3 months ago

gmmyung commented 3 months ago

There is no minimal example to reliably reproduce the bug, but here are related issues: https://github.com/pytorch/pytorch/issues/124960

https://github.com/pytorch/tensordict/blob/484a0456fa210a091f8063784f76179d652871db/tensordict/base.py#L532 https://github.com/pytorch/tensordict/blob/484a0456fa210a091f8063784f76179d652871db/tensordict/base.py#L2614

The codebase contains many non-ascii characters such as U+2013 (–), U+201C (“)that causes pytorch to panic while running torch.compile.

vmoens commented 3 months ago

Are U+2013 and U+201C the only non-ascii characters, or are there others? Not sure how I can reliably find them all in one go

EDIT: seems like /[^\t-~] works ok

gmmyung commented 3 months ago

– “ ” ’ These are all I have found. I only looked in base.py, so there can be more in other files I used /[^\u0000-\u007F] command in vim to find this.