Open LucaCinquini opened 2 months ago
Nga provided examples of 2 CWL workflows for stage-in - when data is downloaded from a DAAC or from Unity:
https://github.com/unity-sds/unity-data-services/tree/cwl-examples/cwl
We can start by creating a DAG Task that - depending on what the user selects - will invoke one or the other CWL, staging data to EFS so it can be used by the sub-sequent Process task.
Suggested steps: o Create a new DAG called cwl_dag_new.py with -initially- the following tasks: o A "setup task" that will expose 2 parameters:
o Depending on input_location, the "cwl_workflow" parameter is set to https://github.com/unity-sds/unity-data-services/blob/cwl-examples/cwl/stage-in-unity/stage-in.cwl or https://github.com/unity-sds/unity-data-services/blob/cwl-examples/cwl/stage-in-daac/stage-in.cwl (use the raw URLs) o I think the "download_dir" parameters can be hardwired to "granules" or "input" or whatever o Then invoke the "cwl_task" which will write to /scratch/granules or /scratch/input o In the "cleanup" task first list the content of the "local_dir" directory, then erase (for now)
Note that the other parameters such as: "unity_client_id" should be retrieved from SSM - see example:
Here is a first draft of the stage in task: https://github.com/unity-sds/unity-sps/blob/220-stage-in-task/airflow/dags/cwl_dag_modular.py
/unity/dev/sps/cognito-uds-client-id
. We may want to request that this be added to the shared services venue so we can retrieve the user ID as we do for the Airflow webserver.Nikki implemented and demonstrated all functionality, this part of the CWL refactoring is done.
Re-opening this task as we discussed a new design which involves executing the 3 steps (stage-in, process, stage-out) sequentially within the same shell script, running in the same Docker container. This will guarantee that all 3 tasks have access to the data on a shared EBS volume.
Examples of 3 sequentials CWL stage-in / process / stage-out workflows provided by Mike:
https://github.com/mike-gangl/unity-OGC-example-application/blob/main/README.md#ogc-run
Write a DAG/Task that invokes the DS Docker container to stage-in data from the DS catalog. Eventually this Task needs to be executed as part of the new Application Package CWL DAG, and followed by the Process task.