pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.2k stars 409 forks source link

python 3.9 gpu tests occasionally fail with "No module named 'jmespath'" #1790

Open RdoubleA opened 2 weeks ago

RdoubleA commented 2 weeks ago

CI on PRs occasionally fails with the following message:

FAILED tests/recipes/test_eleuther_eval.py::TestEleutherEval::test_torchtune_checkpoint_eval_results[truthfulqa_gen-0.1-1] - RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
No module named 'jmespath'

trace:

<frozen importlib._bootstrap>:1055: in _handle_fromlist
    ???
        fromlist   = ('GenerationMixin',)
        import_    = <built-in function __import__>
        module     = <module 'transformers.generation' from '/home/ec2-user/actions-runner/_work/torchtune/torchtune/3/envs/test/lib/python3.9/site-packages/transformers/generation/__init__.py'>
        recursive  = False
        x          = 'GenerationMixin'
3/envs/test/lib/python3.9/site-packages/transformers/utils/import_utils.py:1754: in __getattr__
    module = self._get_module(self._class_to_module[name])
        name       = 'GenerationMixin'
        self       = <module 'transformers.generation' from '/home/ec2-user/actions-runner/_work/torchtune/torchtune/3/envs/test/lib/python3.9/site-packages/transformers/generation/__init__.py'>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <module 'transformers.generation' from '/home/ec2-user/actions-runner/_work/torchtune/torchtune/3/envs/test/lib/python3.9/site-packages/transformers/generation/__init__.py'>
module_name = 'utils'

    def _get_module(self, module_name: str):
        try:
            return importlib.import_module("." + module_name, self.__name__)
        except Exception as e:
>           raise RuntimeError(
                f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
                f" traceback):\n{e}"
            ) from e
E           RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
E           No module named 'jmespath'

module_name = 'utils'
self       = <module 'transformers.generation' from '/home/ec2-user/actions-runner/_work/torchtune/torchtune/3/envs/test/lib/python3.9/site-packages/transformers/generation/__init__.py'>
ebsmothers commented 2 weeks ago

I've been noticing this too. Is it definitely only on 3.9? It also seems to be related to our recent urllib pin. For one, Eleuther has now released 0.4.5 so we should pin to that instead of the git commit hash (though I'd be surprised if this fixes the issue). I also think we may wanna pin to a specific version of requests. (Maybe neither of these will fix things, but this is what I'd try first)

pbontrager commented 1 week ago

Just saw this again today, so the pinned Eleuther 0.4.5 didn't help.