mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.29k stars 4.13k forks source link

[BUG] Lightning library import error #7285

Open syaffers opened 1 year ago

syaffers commented 1 year ago

Issues Policy acknowledgement

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

MLflow version

System information

Describe the problem

A module import error was encountered when calling the mlflow.pytorch.autolog() function in a recent installation of mlflow and lightning.ai (previously called pytorch-lightning) library. This is due to the change in the package name of the lightning library which mlflow still assumes is pytorch_lightning but is now lightning.

Tracking information

MLflow version: 1.30.0 Tracking URI: http://localhost:5000 Artifact URI: data/1/be290c11e1074b3bb7c69537387a5eb2/artifacts

Code to reproduce issue

import mlflow
import pandas as pd
import torch
import lightning as pl
from torch.utils.data import TensorDataset, DataLoader

class ANN(pl.LightningModule):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.net = torch.nn.Linear(3, 1)

    def forward(self, x):
        return self.net(x)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)

    def training_step(self, batch, batch_idx):
        x, y = batch
        yp = self(x)
        loss = torch.nn.functional.mse_loss(yp, y)
        self.log("train_loss", loss)
        return loss

class DModule(pl.LightningDataModule):
    def __init__(self) -> None:
        super().__init__()

    def setup(self, stage):  
        self.dataset = TensorDataset(torch.randn(10, 3), torch.randint(10, 1))

    def train_dataloader(self):
        return DataLoader(self.train_set, batch_size=3)

model = ANN()
datamod = DModule()

mlflow.set_tracking_uri('http://localhost:5000')
mlflow.set_experiment(experiment_id='1')

with mlflow.start_run(run_name='pytorch-nn-v1'):
    mlflow.pytorch.autolog()
    trainer = pl.Trainer(enable_progress_bar=False)
    trainer.fit(model, datamod)

Stack trace

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [6], line 42
     39 mlflow.set_experiment(experiment_id='1')
     41 with mlflow.start_run(run_name='pytorch-nn-v1'):
---> 42     mlflow.pytorch.autolog()
     43     trainer = pl.Trainer(enable_progress_bar=False)
     44     trainer.fit(model, datamod)

File ~/Projects/ds-samples/.venv/lib/python3.8/site-packages/mlflow/utils/autologging_utils/__init__.py:414, in autologging_integration.<locals>.wrapper.<locals>.autolog(*args, **kwargs)
    394 # Reroute non-MLflow warnings encountered during autologging enablement to an
    395 # MLflow event logger, and enforce silent mode if applicable (i.e. if the corresponding
    396 # autologging integration was called with `silent=True`)
    397 with set_mlflow_events_and_warnings_behavior_globally(
    398     # MLflow warnings emitted during autologging setup / enablement are likely
    399     # actionable and relevant to the user, so they should be emitted as normal
   (...)
    412     disable_warnings=is_silent_mode,
    413 ):
--> 414     _check_and_log_warning_for_unsupported_package_versions(name)
    416     return _autolog(*args, **kwargs)

File ~/Projects/ds-samples/.venv/lib/python3.8/site-packages/mlflow/utils/autologging_utils/__init__.py:329, in _check_and_log_warning_for_unsupported_package_versions(integration_name)
    318 def _check_and_log_warning_for_unsupported_package_versions(integration_name):
    319     """
    320     When autologging is enabled and `disable_for_unsupported_versions=False` for the specified
    321     autologging integration, check whether the currently-installed versions of the integration's
    322     associated package versions are supported by the specified integration. If the package versions
    323     are not supported, log a warning message.
    324     """
    325     if (
    326         integration_name in FLAVOR_TO_MODULE_NAME_AND_VERSION_INFO_KEY
    327         and not get_autologging_config(integration_name, "disable", True)
    328         and not get_autologging_config(integration_name, "disable_for_unsupported_versions", False)
--> 329         and not is_flavor_supported_for_associated_package_versions(integration_name)
    330     ):
    331         _logger.warning(
    332             "You are using an unsupported version of %s. If you encounter errors during "
    333             "autologging, try upgrading / downgrading %s to a supported version, or try "
   (...)
    336             integration_name,
    337         )

File ~/Projects/ds-samples/.venv/lib/python3.8/site-packages/mlflow/utils/autologging_utils/versioning.py:83, in is_flavor_supported_for_associated_package_versions(flavor_name)
     78 """
     79 :return: True if the specified flavor is supported for the currently-installed versions of its
     80          associated packages
     81 """
     82 module_name, module_key = FLAVOR_TO_MODULE_NAME_AND_VERSION_INFO_KEY[flavor_name]
---> 83 actual_version = importlib.import_module(module_name).__version__
     85 # In Databricks, treat 'pyspark 3.x.y.dev0' as 'pyspark 3.x.y'
     86 if module_name == "pyspark" and is_in_databricks_runtime():

File ~/.pyenv/versions/3.8.13/lib/python3.8/importlib/__init__.py:127, in import_module(name, package)
    125             break
    126         level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:991, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:973, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'pytorch_lightning'

Other info / logs

requirements.txt:

torch==1.13.0
torchaudio==0.13.0
torchvision==0.14.0
lightning==1.8.0.post1

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

harupy commented 1 year ago

@syaffers Can you install pytorch-lightning?

pip install pytorch-lightning
syaffers commented 1 year ago

@harupy Aha, that did the trick it seems. I went to the lightning website and that was the recommended library to install. I guess we can add this as a new feature to be added so that new users can use lightning without needing to install pytorch_lightining

harupy commented 1 year ago

Is pytorch_lightining deprecated?

syaffers commented 1 year ago

No, I don't believe so. Actually I had some issues just by installing pytorch_lightning alongside this new lightning library even if I installed v1.7.7 (which is compatible with mlflow). I has to reset my virtual environment and reinstall the packages (except lightning) and I'm able to autolog metrics again.

Edit: changed log to autolog since I didn't test the manual log functions.

mlflow-automation commented 1 year ago

@BenWilson2 @dbczumar @harupy @WeichenXu123 Please assign a maintainer and start triaging this issue.