Closed htahir1 closed 2 years ago
I think the terminology is confusing me a bit. Model and configuration to me are somewhat synonymous whereas execution is the result of calling run(). I think what you have described above is that:
var training_pipeline is a Pipeline Model/Configuration/Plan
var pipeline is a Pipeline Execution
Correct?
Each execution would be immutable. The model or config could also be immutable if there is a register() call that creates a history of immutable changes to the model. I would use this for linking the model to the execution and performing experiment tracking and analysis.
I think the key difference here is that there is one name training_pipeline
that the user associates with the canonical pipeline model (all it's versions and executions).
@dr3s You're right -> The word model
is not reflective at all. I updated the comment and adopted the word execution
.
The model or config could also be immutable if there is a register() call that creates history of immutable changes to the model.
I'm not sure about this part of your comment here. The config itself would be mutable, while the execution would be immutable. Perhaps my updated comment would clarify this -> Do let me know if I misunderstood your comment.
I think the key difference here is that there is one name training_pipeline that the user associates with the canonical pipeline model (all it's versions and executions).
Yes, the name is unique and should be defined at execution time rather than construction time.
I'm not sure about this part of your comment here. The config itself would be mutable, while the execution would be immutable. Perhaps my updated comment would clarify this -> Do let me know if I misunderstood your comment.
I think it's helpful to be specific here. There are at least two things with the PipelineConfig that could be mutable: the object reference in code and the data that zenml persists to record that config. The former could be mutable or immutable (using something like the builder pattern). The latter could also be mutable or immutable regardless of how the execution is treated. If the config is only persisted at execution time, it could overwrite the config from the last execution or create a new immutable version of the config that is then attached to the execution when it's created. Having a history of immutable config versions can be useful IMO. You could do this as part of the execution but I prefer to model the config and execution as different domain models.
Yes, the name is unique and should be defined at execution time rather than construction time.
This is confusing to me because the issue is more about using a non-unique name across executions. Yes, it's unique in as far as the user wants to make it unique. We want training_pipeline
to always refer to the same pipeline across all executions and versions of it's config. The name wouldn't be defined at execution time but at design time.
@dr3s I think we're on the same page here. I'd love for you to take a look as this develops. Please keep an eye on it
Is your feature request related to a problem? Please describe. Currently
Pipelines
,Steps
,Datasource
, andBackends
, i.e., first-class ZenML components have the configuration and the post-execution state built-in to them. For example to run a Pipeline:This causes unintended consequences after the pipeline is run -> The execution object becomes immutable (in a hidden way) at that point, it gets in the way of fast iteration if working in a Jupyter notebook setting.
Describe the solution you'd like Due to a variety of reasons, including the ability to test, reduced complexity, and ease of understanding, the community has arrived at a conclusion that the configuration and execution need to be separate Python objects. That is,
The
pipeline_execution
andtraining_pipeline
will be different objects, former being theexecution
object and the latter being theconfiguration
object. Thename
variable will then bind the execution and the configurations for experiment tracking.Describe alternatives you've considered Trying to maintain immutable states after the run() and register() calls but that led to the problems stated above.