udacity / cd0581-building-a-reproducible-model-workflow-exercises

Exercise Staters and solutions for cd0581-building-a-reproducible-model-workflow by Giacomo Vianello
Other
20 stars 87 forks source link

`conda.yml` of exercise 2 solution does not define a working conda environment. #14

Open EdwinWenink opened 1 year ago

EdwinWenink commented 1 year ago

When calling mflow using the provided solution conda.yml results in an error. I use Python version 3.8.16 (followed instructions at beginning of the course).

The call in the solution folder: mlflow run . -P file_url=https://raw.githubusercontent.com/scikit-learn/scikit-learn/4dfdfb4e1bb3719628753a4ece995a1b2fa5312a/sklearn/datasets/data/iris.csv -P artifact_name=iris -P artifact_description="This data sets consists of 3 different types of irises' (Setosa, Versicolour, and Virginica) petal and sepal length"

Output:

Traceback (most recent call last):
  File "XXX\Udacity\cd0581-building-a-reproducible-model-workflow-exercises\lesson-1-machine-learning-pipelines\exercises\exercise_2\solution\download_data.py", line 5, in <module>
    import wandb
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packag    from wandb import sdk as wandb_sdk
  File "XXX\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\sdk\__init__.py", line 12, in <module>
    from .wandb_init import init  # noqa: F401
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\sdk\wandb_init.py", line 29, in <module>
    from .backend.backend import Backend
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\sdk\backend\backend.py", line 17, in <module>
    from ..interface import interface
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\sdk\interface\interface.py", line 18, in <module>
    from wandb.proto import wandb_internal_pb2 as pb
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\proto\wandb_internal_pb2.py", line 15, in <module>
    from wandb.proto import wandb_telemetry_pb2 as wandb_dot_proto_dot_wandb__telemetry__pb2
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\wandb\proto\wandb_telemetry_pb2.py", line 34, in <module>
    _descriptor.FieldDescriptor(
  File "XXX\AppData\Local\anaconda3\envs\mlflow-4b67c93d2a95df2e00cbf3c9f644d2e3dada00e0\Lib\site-packages\google\protobuf\descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.      
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).        

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
2023/05/23 14:11:33 ERROR mlflow.cli: === Run (ID '9b21011b2dd6459690460967d55c7ff2') failed ===

The proposed solution of downgrading protobuf to 3.20.x dit not fix this environment definition, but resulted in new errors (MutableSet not being defined in collections). Please fix/update the learning materials.

jan-1995 commented 12 months ago

I have been facing the same issues with my runs. I downgraded protobuf, added setup tools etc and I too ran into new errors while using mlflow, when i run the standalone python file like python download_dara.py and then add parameters like artificat name, artifact tupe, artifact descriptiona nd file url , it works cleanly but not with mlflow