lilab-bcb / altocumulus

Command line tool for submitting WDL jobs to Terra or Cromwell server.
https://altocumulus.readthedocs.io
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

Terra data model #56

Open sjfleming opened 1 year ago

sjfleming commented 1 year ago

Can I run a Terra workflow and submit inputs using the Terra "data model" (https://terra.bio/new-resources-for-unlocking-the-power-of-terras-data-tables/)... which is to say, can I specify an input file using this.h5_file for example, where h5_file is the name of a column in the sample data table in Terra?

yihming commented 1 year ago

@sjfleming Sorry for the delayed response. Altocumulus has not supported the job submission using inputs from Terra data tables, and thus features like this.h5_file are not supported.

sjfleming commented 1 year ago

Thanks @yihming for your response.

Is this a feature you'd be interested in? I think there may be a way to do it using some of the get_entities methods from FISS (https://github.com/broadinstitute/fiss/blob/master/firecloud/api.py)

If you think there's a place for it in this repo, are you open to pull requests?

yihming commented 1 year ago

@sjfleming Yes, definitely we are open to PRs from community. We currently don't have much bandwidth for new supporting features on Terra, as our daily working environment is directly communicating with Cromwell servers in command-line. That's why we developed alto cromwell commands. Therefore, we could only support Terra in our leisure time.

erikwolfsohn commented 1 year ago

@sjfleming This is something I'm working on too. You can do it for a single row submission using the syntax create_submission(workspaceNamespace, workspaceName, methodConfigurationNamespace, methodConfigurationName, entityType, entityName)

If you're importing workflows from Dockstore to your own workspace and configuring them in the Terra GUI, methodConfigurationNamespace will be the same as workspaceNamespace and methodConfigurationName is the name of the workflow you will run. entityType is the root entity type (probably sample in your case), and entityName is a value from the sample_id column.

You can loop over this command to analyze multiple rows - I'm still not sure how to replicate the behavior of the Terra GUI where you can select multiple rows and generate a new set from them. If I figure that out I'll happily submit a PR. Appreciate all your work on this repo @yihming, it helped me understand the Terra API much better!

sjfleming commented 1 year ago

Thanks @erikwolfsohn , that does seem very close to what I want to do, thanks! I want to avoid the GUI, but I am guessing there's a way to do that.

erikwolfsohn commented 1 year ago

@sjfleming Definitely - glancing over what they've done with Altocumulus, you can pass in the JSON configuration file somewhere at runtime.

I think for your use case, this might be the solution: https://api.firecloud.org/#/Workspaces/postWorkspaceMethodConfig

Disclaimer that I haven't attempted this, but I think you would write or generate the JSON config file for your workflow, pass it into the workspace using the above API call, and then follow the same directions I posted earlier to execute the workflow.