microsoft / MLOpsPython

MLOps using Azure ML Services and Azure DevOps
MIT License
1.21k stars 1.1k forks source link

Model Registration Fails #400

Closed jahanzaibanwar closed 2 years ago

jahanzaibanwar commented 2 years ago

Web capture_10-3-2022_171817_ml azure com I ran the Ci pipeline and it worked super fine. Everything was working. Now its been two days that i am stuck at this error dataset is the same each and everything is the same but this error appeared from no where and i am having hard time to understand why this is happening? Any help would be appreciated Thank you

wissamjur commented 2 years ago

In your train file, probably train_model.py, are you tagging the value dataset.id with the key that you are trying to use in register_model.py?

Should be something like this: run.parent.tag("dataset_id", value=dataset.id)

jahanzaibanwar commented 2 years ago

@wissamjur yes i do have that code in my train_aml.py image

wissamjur commented 2 years ago

@jahanzaibanwar Well, first I'd look at: parent_tags = run.parent.get_tags() try to print it, what do you get in the logs?

Make sure your train file succesffully completes with: run.complete() (after dumping the model of course)

If parent_tags is empty, you can also double check that in your build pipeline, you are passing the pipeline_data param:

register_step = PythonScriptStep(
    name="Register Model ",
    script_name=e.register_script_path,
    compute_target=aml_compute,
    source_directory=e.sources_directory_train,
    inputs=[pipeline_data],
    arguments=[
        "--model_name", model_name_param,
        "--step_input", pipeline_data,
    ],
    runconfig=run_config,
    allow_reuse=False,
)

Your pipeline should have the correct steps in order as well:

train_step.run_after(prep_step)
register_step.run_after(train_step)
steps = [prep_step, train_step, register_step]
train_pipeline = Pipeline(workspace=aml_workspace, steps=steps)

Last thing, I'm not sure if this would be an issue in your case, but since you mentioned that it did work once. Maybe try setting the allow_reuse param to False in your pipeline steps? If you are not using it in the correct way based on your design, you might face such issues