Open LucaCinquini opened 1 month ago
@jpl-btlunsfo ping for status.
@jpl-btlunsfo : I looked at the article you mentioned, that uses Python Dataclasses to automatically generate DAGs: https://medium.com/cts-technologies/designing-repeatable-dags-in-airflow-part-1-db3a72a2307d
Although it will work, it seems unnecessary complicated to me, and one disadvantage is that the DAGs are saved in the global() scope, and not written to the DAGs folder, which reduces visibility.
I am suggesting to use a simple approach, like the one outlined in this article: https://www.astronomer.io/docs/learn/dynamically-generating-dags?tab=taskflow#example-use-a-create_dag-function
In particular, the second option: "Multiple Files Method". In summary, this would be implemented as follows:
o Create a file "include/dag_template.py" which dynamically creates a DAG based on some input parameters o Implement the OGC register() method which, based on the input request, execute that function to create a file in the DAGs folder which replaces the dag_template.py variables with specific values (like, for example, the DAG name and the CWL file).
The above approach seems much simpler and easier to debug to me.
As long as i can execute the CWL with an "arbitrary" json object or link to a json/yaml file like the cwl_dag we have, i'd be very happy in the near term.
Once a user has created a CWL workflow and makes it available at some URL (for example, an Application Package available from Dockstore), we can imagine triggering the OGC register() method to automatically generate a DAG that is very similar to the current generic cwl_dag.py, but includes some customizations: the DAG name, id, and the specific parameters needed by the CWL workflow.
Start simple by generating the "Echo" SAG, which is able to execute this CWL workflow:
https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/main/demos/echo_message.cwl
It should be very similar to this DAG: https://github.com/unity-sds/unity-sps/blob/develop/airflow/dags/cwl_dag.py but customized for the Echo use case.
We can explore either inheriting from a base CWL DAG, or generating the Echo DAG from scratch from a Template.