Open ganow opened 2 years ago
@ganow Thank you for your feature request and very concrete implementation idea! It is very reasonable for me.
@Hi-king @mski-iksm How do you think about this?
@ganow Thank you for raising this issue! I also think is's a good implementation.
Can you go ahead and make a PR?
Hello, thank you for developing such a great tool!
Summary
I have one feature request to validate
TaskInstanceParameter()
at runtime by its subclass bound like the following:Detail
More concrete motivating example is as follows. Suppose we want to perform some feature embedding pre-processing followed by actual data analysis task on top of it. Since we would like to empirically compare which embedding method is better, consider making this pre-processing task (upstream task) abstract and injecting actual choice as a
TaskInstanceParameter()
. The examples of illustrative pipelines are like:In such a situation, we may want to limit the task instance parameter to be injected as an upstream task to some tasks with a specific output format, rather than all possible tasks defined. The current default behavior of
TaskInstanceParameter()
can cause potential bugs because it does not raise any kind of exceptions in all subclasses ofluigi.Task()
.Here is an example code:
For the above code, when PCAEmbed or IsomapEmbed is used for the upstream task, it works fine as follows:
However, if we intentionally inject the wrong upstream task, it fails after hitting incorrect API access to input data due to duck typing, which means that if the wrong API access is only after a very long process in the downstream task (e.g., NN training), we won't notice the problem until we get an error.
It would be better if the problem could be detected at the initialization of the task.
Implementation idea
luigi.Parameter()
providesnormalize(v)
to normalize & validate injected parameters at runtime (spotify/luigi#1273). This method is executed when task object is instanciated. Therefore, it seems that a subclass check can be done by adding the following implementation toTaskInstanceParameter()
.I am happy to create PR for this if the proposal and implementation idea are reasonable for you. Thank you.