ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
138 stars 14 forks source link

DVCLive Folder not created in CI YOLOv10 #785

Open coucjh opened 3 months ago

coucjh commented 3 months ago

Search before asking

Question

I am using DVC live to track experiments and print dvclive created .tsv files to a PR to run models in CI. However I am having an issue in creating any logging metrics in CI, while they are created all fine locally:

I am stuck at this point, I am capturing the results in the logging folder so will proceed to use these, however the .tsv files and dvc functionality is much more useful for model monitoring. I have attached full logs and settings screenshots.

yolo_ci_log.txt

yolo_ci_settings

yolo_local_log.txt

yolo_local_settings

Additional

No response

github-actions[bot] commented 3 months ago

πŸ‘‹ Hello @coucjh, thank you for raising an issue about Ultralytics HUB πŸš€! Please visit our HUB Docs to learn more:

If this is a πŸ› Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger commented 3 months ago

Hi @coucjh,

Thank you for providing detailed information about your issue. It’s great to see your thorough approach in troubleshooting this!

To address the problem with the DVCLive folder not being created in your CI environment, here are a few steps you can take:

  1. Verify Environment Variables: Ensure that the environment variable ULTRALYTICS_DVC_DISABLED is correctly set in your CI environment. Sometimes, environment variables might not be set as expected in CI pipelines.

    import os
    print(os.environ.get("ULTRALYTICS_DVC_DISABLED"))
  2. Check Write Permissions: Ensure that your CI runner has the necessary permissions to write to the directory where the DVCLive folder should be created. Sometimes, CI environments have restricted permissions.

  3. Logging Configuration: Double-check your logging configuration in the CI environment. Ensure that the logging paths and configurations are correctly set up.

  4. Reproducibility: Make sure the issue is reproducible with the latest versions of the packages. Sometimes, updating to the latest versions can resolve unexpected issues.

    pip install --upgrade ultralytics
    pip install --upgrade dvclive
  5. Debugging: Add additional logging to your CI script to capture more details about the environment and the steps being executed. This can help identify where the process might be failing.

    import logging
    logging.basicConfig(level=logging.DEBUG)
  6. CI Configuration: Ensure that your CI configuration (e.g., GitHub Actions, GitLab CI, etc.) is correctly set up to handle the dependencies and environment settings.

If you have verified all the above and the issue persists, please provide more details about your CI configuration and any additional logs that might help in diagnosing the problem further.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are here to help! 😊

coucjh commented 3 months ago

Thanks @pderrenger ! I ran through all the steps, added the upgrades and the debug logging, confirmed we have write permissions and that the ULTRALYTICS_DVC_DISABLED=False. Here is the full log, a dvc lock is clearly being created but no dvclive folder still. Also note I have tried yolo settings dvc=True which says that dvc=true. yolo_full_log_ci.txt

pderrenger commented 3 months ago

Hi @coucjh,

Thank you for your prompt follow-up and for providing the full log. It's great to see your proactive approach in troubleshooting this issue!

Given that the DVC lock is being created but the DVCLive folder is not, let's delve a bit deeper:

  1. DVC Configuration: Ensure that your DVC configuration is correctly set up in the CI environment. Sometimes, DVC might require explicit initialization or configuration in CI.

    dvc init
  2. Explicit DVCLive Initialization: Try explicitly initializing DVCLive in your training script to ensure it starts logging as expected.

    from dvclive import Live
    
    live = Live()
  3. Check for Errors: Look for any subtle errors or warnings in the log that might indicate why the DVCLive folder isn't being created. Sometimes, these can be easy to overlook.

  4. Environment Consistency: Double-check that the environment in CI matches your local environment as closely as possible. Differences in dependencies or configurations can sometimes lead to unexpected behavior.

  5. Minimal Reproducible Example: If possible, create a minimal reproducible example that isolates the issue. This can help in identifying if there’s a specific part of the code or configuration causing the problem.

  6. Community and Documentation: Sometimes, other users might have faced similar issues. Checking the Ultralytics HUB discussions or the DVC documentation might provide additional insights.

If the issue persists after these steps, it might be helpful to share a minimal reproducible example or any additional context that could shed light on the problem.

Thank you for your patience and diligence. The YOLO community and the Ultralytics team are here to support you! 😊

coucjh commented 3 months ago

Someone from the dvc community managed to find the issue!

dberenbaum β€” Yesterday at 16:49 The issue might be here: https://github.com/ultralytics/ultralytics/blob/8648572809fa2e58967862f9a4748abddd0f60a7/ultralytics/utils/callbacks/dvc.py#L6 GitHub ultralytics/ultralytics/utils/callbacks/dvc.py at 8648572809fa2e589... NEW - YOLOv8 πŸš€ in PyTorch > ONNX > OpenVINO > CoreML > TFLite - ultralytics/ultralytics ultralytics/ultralytics/utils/callbacks/dvc.py at 8648572809fa2e589... TESTS_RUNNING is defined in https://github.com/ultralytics/ultralytics/blob/8648572809fa2e58967862f9a4748abddd0f60a7/ultralytics/utils/__init__.py#L1072. It looks like if GH actions are running, yolo will assume tests are running and skip the integration.

I checked via a print and this is the problem. I have written a command to replace that output of the function as a temporary fix, but how can this be changed in the ultralytics package? I believe my use case is valid as a non test CI usage of YOLO + DVClive. Is this something that can be reported or should I attempt to fork and contribute myself? The sed command is: sed -i 's/return "GITHUB_ACTIONS" in os.environ and "GITHUB_WORKFLOW" in os.environ and "RUNNER_OS" in os.environ/return False/' /opt/tensorflow/lib/python3.10/site-packages/ultralytics/utils/__init__.py

pderrenger commented 3 months ago

Hi @coucjh,

Thank you for sharing the insights from the DVC community! It's fantastic to see collaborative efforts in troubleshooting this issue.

The problem you've identified with the TESTS_RUNNING environment variable check in the Ultralytics package is indeed a valid concern for non-test CI usage of YOLO and DVCLive. Here's how we can address this:

  1. Temporary Workaround: Your sed command is a practical temporary fix. For others who might face a similar issue, here's the command again for reference:

    sed -i 's/return "GITHUB_ACTIONS" in os.environ and "GITHUB_WORKFLOW" in os.environ and "RUNNER_OS" in os.environ/return False/' /opt/tensorflow/lib/python3.10/site-packages/ultralytics/utils/__init__.py
  2. Reporting the Issue: This is a valid use case and should be reported. You can open an issue in the Ultralytics HUB GitHub repository to bring this to the attention of the development team. Please include the details you've shared here, as well as any additional context that might help in understanding the impact.

  3. Contributing a Fix: If you're comfortable with it, contributing a fix would be highly appreciated! Fork the repository, make the necessary changes, and submit a pull request. The Ultralytics team and community members can then review and merge your contribution.

  4. Proposed Change: A potential change could involve adding a configuration option to bypass the TESTS_RUNNING check or making the check more specific to actual test scenarios. This would allow CI environments to use YOLO and DVCLive without being mistakenly identified as test runs.

    def tests_running():
        return "GITHUB_ACTIONS" in os.environ and "GITHUB_WORKFLOW" in os.environ and "RUNNER_OS" in os.environ and not os.getenv("ULTRALYTICS_CI", False)

Thank you for your proactive approach and for bringing this to our attention. Your contributions and feedback are invaluable to the YOLO community and the Ultralytics team. If you have any further questions or need assistance with the contribution process, feel free to ask. 😊