learnables / learn2learn

A PyTorch Library for Meta-learning Research
http://learn2learn.net
MIT License
2.61k stars 350 forks source link

MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`. when running Pytorch Lightning example #334

Closed lolokoko28 closed 2 years ago

lolokoko28 commented 2 years ago

When running the example file: https://github.com/learnables/learn2learn/blob/master/examples/vision/lightning/main.py

I got this error:

File "C:\mscai-atcs\l2l_pl_example.py", line 96, in main() File "C:\mscai-atcs\l2l_pl_example.py", line 91, in main trainer.fit(model=algorithm, datamodule=episodic_data) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 768, in fit self._call_and_handle_interrupt( File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1234, in _run results = self._run_stage() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1321, in _run_stage return self._run_train() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1343, in _run_train self._run_sanity_check() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1411, in _run_sanity_check val_loop.run() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\loops\base.py", line 211, in run output = self.on_run_end() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 206, in on_run_end self._on_evaluation_end() File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 277, in _on_evaluation_end self.trainer._call_callback_hooks("on_validation_end", *args, *kwargs) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1634, in _call_callback_hooks fn(self, self.lightning_module, args, kwargs) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\learn2learn\utils\lightning.py", line 87, in on_validation_end trainer.test(model=module, verbose=False) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 936, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 983, in _test_impl results = self._run(model, ckpt_path=self.ckpt_path) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1158, in _run verify_loop_configurations(self) File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py", line 46, in verify_loop_configurations __verify_eval_loop_configuration(trainer, model, "test") File "C:\ProgramData\Anaconda3\envs\mscai-atcs\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py", line 197, in __verify_eval_loop_configuration raise MisconfigurationException(f"No {loader_name}() method defined to run Trainer.{trainer_method}.") pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_dataloader() method defined to run Trainer.test.

Environment: I'm using Anaconda, Python 3.10 on Windows 10 and that's what I get after pip freeze:

absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 argcomplete==2.0.0 async-timeout==4.0.2 attrs==21.4.0 boto==2.49.0 cachetools==5.0.0 certifi==2020.6.20 cffi==1.15.0 charset-normalizer==2.0.12 click==8.1.3 cloudpickle==2.0.0 colorama==0.4.4 crcmod==1.7 cryptography==37.0.2 datasets==2.1.0 dill==0.3.4 fasteners==0.17.3 filelock==3.6.0 frozenlist==1.3.0 fsspec==2022.3.0 gcs-oauth2-boto-plugin==3.0 google-apitools==0.5.32 google-auth==2.6.6 google-auth-oauthlib==0.4.6 google-reauth==0.1.1 grpcio==1.46.0 gsutil==5.10 gym==0.23.1 gym-notices==0.0.6 httplib2==0.20.4 huggingface-hub==0.5.1 idna==3.3 joblib==1.1.0 learn2learn==0.1.7 Markdown==3.3.7 monotonic==1.6 multidict==6.0.2 multiprocess==0.70.12.2 nltk==3.7 numpy==1.22.3 oauth2client==4.1.3 oauthlib==3.2.0 packaging==21.3 pandas==1.4.2 Pillow==9.1.0 protobuf==3.20.1 pyarrow==8.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pyDeprecate==0.3.2 pyOpenSSL==22.0.0 pyparsing==3.0.8 python-dateutil==2.8.2 pytorch-lightning==1.6.3 pytz==2022.1 pyu2f==0.1.5 PyYAML==6.0 qpth==0.0.15 regex==2022.4.24 requests==2.27.1 requests-oauthlib==1.3.1 responses==0.18.0 retry-decorator==1.1.1 rsa==4.7.2 sacremoses==0.0.53 scipy==1.8.0 six==1.16.0 tensorboard==2.9.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tokenizers==0.12.1 torch==1.11.0 torchmetrics==0.8.2 torchvision==0.12.0 tqdm==4.64.0 transformers==4.18.0 typing_extensions==4.2.0 urllib3==1.26.9 Werkzeug==2.1.2 wincertstore==0.2 xxhash==3.0.0 yarl==1.7.2

Does anyone have an idea what might be causing it?

PS: I actually tried this also on the other script using l2l, MAML, and PyTorch-lightning with a custom dataset and got the same bug

stfwn commented 2 years ago

The problem is with TrackTestAccuracyCallback, which calls trainer.test without passing the datamodule.

https://github.com/learnables/learn2learn/blob/f099ddc9ce0c10cff901ecb1acee2838d171272e/learn2learn/utils/lightning.py#L84-L87

Inside the pl trainer, self._data_connector.attach_data then fails to assign any dataloader to the LightningModule's test_dataloader because all the loaders as well as the datamodule are None, so the attribute is silently not overridden, leading to the technically correct but not very informative MisconfigurationException.

This bug was probably introduced when PL decoupled trainer.fit and trainer.test. Previously you didn't have to provide the datamodule in test if you had done so in an earlier call to trainer.fit, but now you have to provide it every time. I don't recall which version of PL this was and I couldn't easily find it.

I see two solutions:

I would personally go with the second one.

lolokoko28 commented 2 years ago

Good catch! Thank you! Also after doing second thing of your suggestions I get a different error and switching to 1.0.2 Pytorch-lightining helps but it seems a bit outdated :/