Model TypeError while running Lag_Llama_Fine_Tuning_Demo notebook

ilteralp commented 1 month ago

Hi all,

I have encountered the error below on while running the following line of "Lag_Llama_Fine_Tuning_Demo.ipynb",

predictor = estimator.train(dataset.train, cache_data=True, shuffle_buffer_length=1000)

Error message:

 TypeError: `model` must be a `LightningModule` or `torch._dynamo.OptimizedModule`, got `LagLlamaLightningModule`

Here are the details:

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 predictor = estimator.train(dataset.train, cache_data=True, shuffle_buffer_length=1000)

File ~\anaconda3\envs\lag_llama\lib\site-packages\gluonts\torch\model\estimator.py:237, in PyTorchLightningEstimator.train(self, training_data, validation_data, shuffle_buffer_length, cache_data, ckpt_path, **kwargs)
    228 def train(
    229     self,
    230     training_data: Dataset,
   (...)
    235     **kwargs,
    236 ) -> PyTorchPredictor:
--> 237     return self.train_model(
    238         training_data,
    239         validation_data,
    240         shuffle_buffer_length=shuffle_buffer_length,
    241         cache_data=cache_data,
    242         ckpt_path=ckpt_path,
    243     ).predictor

File ~\anaconda3\envs\lag_llama\lib\site-packages\gluonts\torch\model\estimator.py:205, in PyTorchLightningEstimator.train_model(self, training_data, validation_data, from_predictor, shuffle_buffer_length, cache_data, ckpt_path, **kwargs)
    202 trainer_kwargs = {**self.trainer_kwargs, "callbacks": callbacks}
    203 trainer = pl.Trainer(**trainer_kwargs)
--> 205 trainer.fit(
    206     model=training_network,
    207     train_dataloaders=training_data_loader,
    208     val_dataloaders=validation_data_loader,
    209     ckpt_path=ckpt_path,
    210 )
    212 logger.info(f"Loading best model from {checkpoint.best_model_path}")
    213 best_model = training_network.load_from_checkpoint(
    214     checkpoint.best_model_path
    215 )

File ~\anaconda3\envs\lag_llama\lib\site-packages\pytorch_lightning\trainer\trainer.py:529, in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    504 def fit(
    505     self,
    506     model: "pl.LightningModule",
   (...)
    510     ckpt_path: Optional[_PATH] = None,
    511 ) -> None:
    512     r"""Runs the full optimization routine.
    513 
    514     Args:
    515         model: Model to fit.
    516 
    517         train_dataloaders: An iterable or collection of iterables specifying training samples.
    518             Alternatively, a :class:`~pytorch_lightning.core.datamodule.LightningDataModule` that defines
    519             the :class:`~pytorch_lightning.core.hooks.DataHooks.train_dataloader` hook.
    520 
    521         val_dataloaders: An iterable or collection of iterables specifying validation samples.
    522 
    523         datamodule: A :class:`~pytorch_lightning.core.datamodule.LightningDataModule` that defines
    524             the :class:`~pytorch_lightning.core.hooks.DataHooks.train_dataloader` hook.
    525 
    526         ckpt_path: Path/URL of the checkpoint from which training is resumed. Could also be one of two special
    527             keywords ``"last"`` and ``"hpc"``. If there is no checkpoint file at the path, an exception is raised.
    528 
--> 529     Raises:
    530         TypeError:
    531             If ``model`` is not :class:`~pytorch_lightning.core.LightningModule` for torch version less than
    532             2.0.0 and if ``model`` is not :class:`~pytorch_lightning.core.LightningModule` or
    533             :class:`torch._dynamo.OptimizedModule` for torch versions greater than or equal to 2.0.0 .
    534 
    535     For more information about multiple dataloaders, see this :ref:`section <multiple-dataloaders>`.
    536 
    537     """
    538     model = _maybe_unwrap_optimized(model)
    539     self.strategy._lightning_module = model

File ~\anaconda3\envs\lag_llama\lib\site-packages\pytorch_lightning\utilities\compile.py:125, in _maybe_unwrap_optimized(model)
    123         raise TypeError(f"`model` must be a `LightningModule`, got `{type(model).__qualname__}`")
    124     return model
--> 125 from torch._dynamo import OptimizedModule
    127 if isinstance(model, OptimizedModule):
    128     return from_compiled(model)

TypeError: `model` must be a `LightningModule` or `torch._dynamo.OptimizedModule`, got `LagLlamaLightningModule`

ashok-arjun commented 1 month ago

Hi! I just checked and it works fine for me on Google Colab. Which environment did you get this error on? Can you check if it works on google Colab?

ilteralp commented 1 month ago

Hi Arjun! Thank you for your reply.

The error occurred on a Windows environment using Conda.
I also tested it on Google Colab, and it worked fine.

Could the issue be related to a conflict in packages ? I'm attaching the output of conda list for reference.

Thanks a lot for your help!

ashok-arjun commented 1 month ago

Hi! I'm not sure where the issue this; we never tested this on the windows environment so there might be a conflict in the packages.

Maybe check if the conda packages match between the windows one and the colab one?

time-series-foundation-models / lag-llama

Model TypeError while running Lag_Llama_Fine_Tuning_Demo notebook #57