ndhers commented 1 year ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Integrations

Bug

Hey,

I am trying to leverage Yolov8's MLflow integration within Azure Databricks. I am able to save training artifacts as well as model weights as intended. However, I would like to log the model as a pyfunc object to allow for model registry and deployment. Whenever I try running the pyfunc.log_model() function, I get the following error:

"[Errno 95] Operation not supported"

Someone seems to have raised the question in one closed issue here with no solutions: https://github.com/ultralytics/ultralytics/issues/1525

Environment

No response

Minimal Reproducible Example

from ultralytics import YOLO

Load a model

model = YOLO("yolov8n.yaml")  # build a new model from scratch
model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

Use the model

model.train(data="coco128.yaml", epochs=3) # train the model

Source code in ultralytics/yolo/utils/callbacks/mlflow.py

def on_train_end(trainer):
    """Called at end of train loop to log model artifact info."""
    if mlflow:
        root_dir = Path(__file__).resolve().parents[3]
        run.log_artifact(trainer.last)
        run.log_artifact(trainer.best)
        run.pyfunc.log_model(artifact_path=experiment_name,
                             code_path=[str(root_dir)],
                             artifacts={'model_path': str(trainer.save_dir)},
                             python_model=run.pyfunc.PythonModel())

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 year ago

👋 Hello @ndhers, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.7 environment with PyTorch>=1.7.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 year ago

@ndhers hello,

Thank you for reaching out. It seems that you're experiencing an issue when trying to log the YOLOv8 model as a pyfunc object for model registry and deployment on Azure Databricks. The error message you're seeing, "Operation not supported", is usually indicative of a feature or method that's not supported, either in the current environment or with the current configurations.

Regarding the closed issue that you've linked, one possible reason it didn't have any solutions could be that the issue was specific to the user's local configurations or the problem was not adequately investigated.

For your specific issue, there could be a compatibility issue between the methods used in YOLOv8's MLflow integration and the operation you're trying to perform or existing limitations within Azure Databricks that are causing this error.

We would recommend starting by checking the compatibility of YOLOv8's MLflow integration with Azure Databricks to assure they can work together seamlessly. Moreover, please ensure that your Azure Databricks environment has adequate permissions and configurations to perform the desired operation.

Also, verify that the pyfunc object logged by the log_model() functionality is properly defined and compatible with your runtime environment.

Please let us know if you have any additional information that may aid in narrowing down what could be causing this problem. Thanks for using YOLOv8!

AnastasiaProkaieva commented 1 year ago

Hi @glenn-jocher I am running a YOLO8 and I am facing a different error but it relates to the same code snippet. My error is: RestException: INVALID_PARAMETER_VALUE: Invalid value '/Users/user_name/yolo_testing/MLmodel' for parameter: 'path'. Path must be relative.

def on_train_end(trainer):
    """Called at end of train loop to log model artifact info."""
    if mlflow:
        root_dir = Path(__file__).resolve().parents[3]
        run.log_artifact(trainer.last)
        run.log_artifact(trainer.best)
        run.pyfunc.log_model(artifact_path=experiment_name,
                             code_path=[str(root_dir)],
                             artifacts={'model_path': str(trainer.save_dir)},
                             python_model=run.pyfunc.PythonModel())

Can you please clarify how to set the trainer.save_dir? Or do you have a way how can I turn the logging of MlFLow? Try / except via import - does not really work if I cannot uninstall mlflow. Any ideas how to or fix the error or to remove the mlflow logging from the callback ? Thanks a lot

glenn-jocher commented 1 year ago

@AnastasiaProkaieva trainer.save_dir is created using the project and name arguments as i.e. project/name, or if they are not provided it a save_dir is created and incremented automatically for you.

AnastasiaProkaieva commented 1 year ago

Thank you for your answer.

Sorry if I missed that information, but would you mind adding a flag to turn this logging off? I could think of a PR for that.

In the meantime, I saw there is a callback with MlFLow - what would be the way to remove it, or what would be the approach to not log into MlFLow at all(without unnistalling)? Thanks a lot,

glenn-jocher commented 1 year ago

@AnastasiaProkaieva hello,

We're glad to hear that you're considering contributing to YOLOv8. Adding a flag to disable the MLflow logging sounds like a potentially useful feature. Please feel free to submit a PR for that.

Regarding your second question, if you don't want to log into MLflow at all, you could modify the callback functions that are associated with MLflow logging. You would look for the functions in the code base that call MLflow (like the "on_train_end" function you mentioned) and remove or comment out the MLflow logging lines. Please note that making changes to the source code should be done cautiously as it can impact other functionalities.

As always, thank you for your contribution to YOLOv8 and we look forward to your PR.

Best regards.

mohammadsubhani commented 1 year ago

I figured out a workaround for this @AnastasiaProkaieva and @glenn-jocher

spark.conf.set("spark.databricks.mlflow.autologging.enabled",False)

from ultralytics import YOLO

with mlflow.start_run(experiment_id="1519679226727373"):
    active_run = mlflow.active_run()
    from ultralytics import settings
    # Update a setting
    settings.update({'mlflow': False})

    # Load a pretrained YOLO model (recommended for training)
    model = YOLO('yolov8n.pt')

    # Train the model using the 'coco128.yaml' dataset for 1 epoch
    model.train(data='coco128.yaml', epochs=1)

    mlflow.log_artifact(model.trainer.last)
    mlflow.log_artifact(model.trainer.best)

    # Log the PyTorch model to the artifact location specified by 'artifact_path'
    mlflow.pyfunc.log_model(artifact_path="model", artifacts={'model_path': str(model.trainer.save_dir)},python_model=mlflow.pyfunc.PythonModel())

The above is working as expected.

glenn-jocher commented 1 year ago

@mohammadsubhani hi there,

I'm glad to hear that you've found a workaround to manage MLflow logging in YOLOv8 and have successfully managed to train the model as you intended. Utilizing settings to adjust the MLflow flag and specifying the configuration for the Spark session are indeed a feasible strategy to manage the autologging feature.

Thank you for sharing your solution with the community. This could certainly provide a reference for others who may encounter similar situations. I appreciate your input and your interest in making YOLOv8 even better!

Best Regards.

Hir98 commented 1 year ago

@mohammadsubhani @glenn-jocher I am facing issue to log artifacts and custom train yolov8 model weight in mlflow server. when i write # mlflow.pytorch.log_model(model, "model") then torch is not supported in yolov8. when i write mlflow.log_artifacts("/content/runs/detect/train8/weights", artifact_path="states")still facing issue

how can i save all generate weights and generated graph in mlflow sever??

please provide sample working code if possible thanks

glenn-jocher commented 1 year ago

Hi @Hir98,

It appears you're encountering challenges when trying to log artifacts and custom trained weights for the YOLOv8 model in the MLflow server.

When you're attempting to use mlflow.pytorch.log_model(model, "model"), it's not functioning as expected because YOLOv8 does not design to support PyTorch natively. As YOLOv8 has its own way of handling and processing model weights, you might need to adjust your approach slightly.

Regarding the second issue with mlflow.log_artifacts("/content/runs/detect/train8/weights", artifact_path="states"), ensure the path to your weights is correct, and the weights have been appropriately generated and stored in the specified directory.

Saving all generated weights and graphs on the MLflow server involves making sure that everything is placed in the right directories and with the correct formats. You can investigate saving the graphs not as PyTorch models, but as generic file artifacts, as this might help with storing them on the MLflow server.

Unfortunately, we can't provide sample code within this discussion thread. However, adjusting your approach according to the advice above might help you resolve these issues. I hope this guidance assists you in your work with YOLOv8 and MLflow.

Hir98 commented 1 year ago

@glenn-jocher

i am trying to save all generated weight in mlsever through below statements. mlflow.log_artifacts("/content/runs/detect/train8/weights", artifact_path="states")

custom weight of yolov8 store it in /content/runs/detect/train8/weights and i try to store whole folder train8 in mlflow server with mlflow.log_artifacts but it is not able to store all this in mlflow server .

but actual it is storing all weight in colab itsef with project name

Code : `

from mlflow.protos.service_pb2 import PROJECT
 import mlflow

  # load the configuration file
  with open(r"/content/drive/MyDrive/YOLOv8/params.yaml") as f:
      params = yaml.safe_load(f)

  mlflow.end_run()

  # set the tracking uri
  mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

  # start mlflow experiment
  with mlflow.start_run(run_name=params['name']):
    # load a pre-trained model
    model = YOLO(params['model_type'])
    PROJECT = "my_proj"

    # # train
    model.train(
              data='/content/drive/MyDrive/YOLOv8/dataset.yaml',
              imgsz=params['imgsz'],
              batch=params['batch'],
              epochs=params['epochs'],
              optimizer=params['optimizer'],
              lr0=params['lr0'],
              seed=params['seed'],
              pretrained=params['pretrained'],    project=PROJECT, 
    )
    print("storing data...")
    mlflow.log_artifacts("/content/my_proj/train", artifact_path="states")
    print("stop storing data...")

`

Hir98 commented 1 year ago

@glenn-jocher I want to save model weight and all generated graph in mlflow server so which is supported by YOLOv8 to integrate it with mlflow server?

glenn-jocher commented 1 year ago

Hi @Hir98,

Thank you for bringing up this issue. To log the model weights and generated graphs from a YOLOv8 training to an MLflow server, you'll need to ensure that the directory and file paths as well as the file formats are correct and compatible with what MLflow expects.

Here's a general guidance:

Artifacts logging in MLflow: Using the log_artifact() or log_artifacts() function, you can log individual files or whole directories as artifacts. You need to provide the local path to the file or directory.
Saving YOLOv8 weights: YOLOv8 by default saves weights in its own format, not as PyTorch models. So saving weights directly as a PyTorch type using mlflow.pytorch.log_model() might fail. Instead, think of saving weights as generic file artifacts.
Saving Generated Graphs: If these graphs are images or in a format that can be logged as an artifact, consider doing so using the artifacts log function.

Remember you'll need to manage all file and directory paths appropriately. If you're running experiments on a remote server or cloud, make sure you're accessing the right directories.

Let us know if you have more specific questions or require further help!

github-actions[bot] commented 1 year ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / ultralytics

Azure Databricks Yolov8 Model Logging #3817

Search before asking

YOLOv8 Component

Bug

Environment

Minimal Reproducible Example

Load a model

Use the model

Source code in ultralytics/yolo/utils/callbacks/mlflow.py

Additional

Are you willing to submit a PR?

Install

Environments

Status