nfdi4plants / nfdi4plants.knowledgebase

This is the source repo for the nfdi4plants knowledge base webpage.
https://nfdi4plants.org/nfdi4plants.knowledgebase/
Creative Commons Attribution 4.0 International
2 stars 30 forks source link

When combinging the results of multiple CWL workflow/tool description, "type: Directory" should be avoided #423

Open mr-c opened 4 weeks ago

mr-c commented 4 weeks ago

Hello,

My name is Michael R. Crusoe. I'm one of the co-founder of the CWL project, the CWL Project Lead, and since a few years I am a member of de.NBI/ELIXIR-DE based out of F. U. Berlin. I'm also a big fan of the ARC specification!

While helping someone today, I came across https://nfdi4plants.org/nfdi4plants.knowledgebase/docs/guides/ComputationalWorkflows/cwl_examples.html

I would advise against using type: Directory instead of specific outputs, as it makes combining multiple CWL CommandLineTool and/or Workflow descriptions difficult. The resulting directory would have to be parsed and its contents separated before running other tools or workflows.

Not only does returning separate outputs make workflow composition easier, it allows for better metadata and provenance tracking.

My suggestion is: since this is documentation for a specific audience, then it would be better to use a real tool that might be familiar to many DataPLANT users.

I am happy to discuss this further and to see how else I may be of assistance. Feel free to email me, mrc at commonwl.org; or my firstname.lastname@fu-berlin.de

Cheers,

caroott commented 4 weeks ago

Hey, thanks for your input! I totally agree with you, this would only really work if the following tool expects a directory as an input or this is the final step, but even then it's the easier solution and not the better solution. I will also look for an example tool that is known and provide that as well to make it easier to comprehend instead of the abstract examples.