huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.6k stars 26.42k forks source link

Undefined variable in: scripts/check_tokenizers.py #33661

Open vignesh1507 opened 3 days ago

vignesh1507 commented 3 days ago

System Info

Python 3.12.4

Who can help?

No response

Information

Tasks

Reproduction

if check_diff(spm_ids[first : first + i], tok_ids[first : first + j], sp, tok) and check_details( line, spm_ids[first + i : last], tok_ids[first + j : last], slow, fast,

Expected behavior

Undefined Variables: sp and tok are not defined anywhere within the check_details function or its enclosing scopes. This will result in a NameError when the code attempts to execute this line.

LysandreJik commented 2 days ago

Hey @vignesh1507, would you like to open a PR to fix this?

niqodea commented 1 day ago

I opened a draft PR for this issue #33702 with what I believe are the correct variables to pass in this case, but I need to verify this further since I couldn't run it in my environment. I am getting the error:

AttributeError: module transformers has no attribute Phi3Tokenizer

Which makes me wonder whether this is due to the script being unmaintained. Could also just be my environment not being set up correctly though.

I also have had trouble understanding the purpose of the script, there is little documentation available and couldn't find much online. I would be happy to also document better the existence of this script with a bit of guidance. Do you think this could warrant an issue, @LysandreJik?

vignesh1507 commented 1 day ago

Yeah i can assist you with the documentation part and can you tell me more about the error you are facing on your local pc?

niqodea commented 6 hours ago

@vignesh1507 Sure, this is the complete traceback I am getting:

Traceback (most recent call last):
  File "/home/nicodea/hf/transformers/scripts/check_tokenizers.py", line 12, in <module>
    TOKENIZER_CLASSES = {
  File "/home/nicodea/hf/transformers/scripts/check_tokenizers.py", line 13, in <dictcomp>
    name: (getattr(transformers, name), getattr(transformers, name + "Fast")) for name in SLOW_TO_FAST_CONVERTERS
  File "/home/nicodea/hf/transformers/src/transformers/utils/import_utils.py", line 1757, in __getattr__
    raise AttributeError(f"module {self.__name__} has no attribute {name}")
AttributeError: module transformers has no attribute Phi3Tokenizer

Details about my environment:

~EDIT: upon reinstalling the environment (at commit f3c1a172b) with pip install -e .[quality] as suggested here I am now no longer getting that error. Now, upon running pip install -e ".[sentencepiece]", I am getting:~

<redacted>
vignesh1507 commented 5 hours ago

Thanks for the response @niqodea I'll look into the error and get back to you with a possible solution.

niqodea commented 5 hours ago

Hey @vignesh1507, I updated the comment, never mind about the second error - I was using a very old version of transformers because I pulled from my very outdated origin/main, my bad. The first error is the one I am still experiencing, and I can't seem to have it fixed even after installing with .[quality].