microsoft / CromwellOnAzure

Microsoft Genomics implementation of the Broad Institute's Cromwell workflow engine on Azure
MIT License
133 stars 55 forks source link

How to run CWL flows #65

Closed rick-ji closed 4 years ago

rick-ji commented 4 years ago

is there any instruction to run CWL flows?

ducatiMonster916 commented 4 years ago

Hi Rick. Apologies for the delayed response.

So while Cromwell itself has native support for CWL, we have focused our efforts in building out this solution based on WDL. With that being said, we are running some tests to test the general capability of CWL files to be properly processed with the Task Execution Service (TES). I'm making some adjustments to a CWL, input.json, and trigger file I've written for doing an alignment workflow and will report back this afternoon. If it works, I'll add a page to the documentation that addresses CWLs specifically for future reference.

ducatiMonster916 commented 4 years ago

Hi Rick, I have an update for you.

I've tried running CWL workflows with our Cromwell on Azure implementation, and currently our implementation of TES does not properly support file structures expected for running a CWL. We will begin working on a fix for our implementation of TES to support CWL files natively.

In the interim, I've also looked into possibly running a conversion tool, and the 2 that I found are unfortunately abandoned code and do not convert CWL 1.0 standard files correctly. So at this time, I'd have to say that until we build out the native CWL support, running CWL files with Cromwell on Azure is unsupported at this time. I will keep you updated with progress on this.

rick-ji commented 4 years ago

Thank you very much for the note! I too have tried one of the cwl2wdl repo, it didn’t work for me either. Will wait to hear for the new update on cwl support for the time being I’ll see if I can rewrite cwl into wdl

Thanks Rick Sent from mobile


From: Roberto Antonio Lleras notifications@github.com Sent: Friday, April 17, 2020 9:55:38 AM To: microsoft/CromwellOnAzure CromwellOnAzure@noreply.github.com Cc: rick-ji jixin85@outlook.com; Author author@noreply.github.com Subject: Re: [microsoft/CromwellOnAzure] How to run CWL flows (#65)

Hi Rick, I have an update for you.

I've tried running CWL workflows with our Cromwell on Azure implementation, and currently our implementation of TES does not properly support file structures expected for running a CWL. We will begin working on a fix for our implementation of TES to support CWL files natively.

In the interim, I've also looked into possibly running a conversion tool, and the 2 that I found are unfortunately abandoned code and do not convert CWL 1.0 standard files correctly. So at this time, I'd have to say that until we build out the native CWL support, running CWL files with Cromwell on Azure is unsupported at this time. I will keep you updated with progress on this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/microsoft/CromwellOnAzure/issues/65#issuecomment-614956720, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANWZVRHXRIWKBENNGJJA4UTRM6LHVANCNFSM4MHQ5RCA.

ducatiMonster916 commented 4 years ago

Hi Rick,

I have an update for you. I'm happy to report that we've looked into the Cromwell architecture and TES and figured out a working solution for utilizing CWL files with Azure. There's a few key pieces of information:

1) You have to provide any dependencies associated with the CWL as a ZIP file or link to an external web location.

2) You cannot directly specify disk size at this time for your workflow. TES does not properly parse out disk information to tell Azure Batch to spawn a VM with a specific local HDD. Therefore, if the customer is running a task that requires significant I/O on intermediate files, we would highly recommend running the workflow in WDL instead, where you can specify the HDD needed.

I'm writing up an FAQ page on my branch today to walk through making a CWL that's Cromwell for Azure compliant for reference and will merge it in with the next release into Master.

-Roberto

rick-ji commented 4 years ago

That’s awesome! Thanks for the update.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: Roberto Antonio Llerasmailto:notifications@github.com Sent: Saturday, 25 April 2020 4:17 AM To: microsoft/CromwellOnAzuremailto:CromwellOnAzure@noreply.github.com Cc: rick-jimailto:jixin85@outlook.com; Authormailto:author@noreply.github.com Subject: Re: [microsoft/CromwellOnAzure] How to run CWL flows (#65)

Hi Rick,

I have an update for you. I'm happy to report that we've looked into the Cromwell architecture and TES and figured out a working solution for utilizing CWL files with Azure. There's a few key pieces of information:

  1. You have to provide any dependencies associated with the CWL as a ZIP file or link to an external web location.
  2. You cannot directly specify disk size at this time for your workflow. TES does not properly parse out disk information to tell Azure Batch to spawn a VM with a specific local HDD. Therefore, if the customer is running a task that requires significant I/O on intermediate files, we would highly recommend running the workflow in WDL instead, where you can specify the HDD needed.

I'm writing up an FAQ page on my branch today to walk through making a CWL that's Cromwell for Azure compliant for reference and will merge it in with the next release into Master.

-Roberto

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/microsoft/CromwellOnAzure/issues/65#issuecomment-619170267, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANWZVREGJ56BVBV2F3GRPP3ROHJS7ANCNFSM4MHQ5RCA.

rick-ji commented 4 years ago

Hi Roberto,

Is that FAQ still WIP? I couldn't see that in the two branches in this repo, or am I missing it?

RIck

ducatiMonster916 commented 4 years ago

Hi Rick,

My apologies. So the FAQ will accompany the next release as we encountered some other things during final testing that required additional code fixes to address to make the functionality more seamless (you’ll still potentially have some issues without the code in the upcoming release). Let me check with the dev team to get an idea on their timeline for the next release. Will update you. shortly.

-Roberto


Roberto Lleras

Senior Applications Scientist | Microsoft Genomics, Microsoft Healthcare NeXT

From: rick-ji notifications@github.com Sent: Wednesday, May 6, 2020 11:05 PM To: microsoft/CromwellOnAzure CromwellOnAzure@noreply.github.com Cc: Roberto Lleras Roberto.Lleras@microsoft.com; Assign assign@noreply.github.com Subject: Re: [microsoft/CromwellOnAzure] How to run CWL flows (#65)

Hi Roberto,

Is that FAQ still WIP? I couldn't see that in the two branches in this repo, or am I missing it?

RIck

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FCromwellOnAzure%2Fissues%2F65%23issuecomment-625047880&data=02%7C01%7CRoberto.Lleras%40microsoft.com%7C8bf5eefeeaec4e46a29808d7f24ca1f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637244283257511172&sdata=%2FoaHgv%2BA9fCym7Yyz%2FNPOLKQThWjrbspbd94iZ1HqZU%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAOR4GKH3UAF76TBT5ZEPSR3RQJFSJANCNFSM4MHQ5RCA&data=02%7C01%7CRoberto.Lleras%40microsoft.com%7C8bf5eefeeaec4e46a29808d7f24ca1f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637244283257511172&sdata=vDAa0ajpdDf1%2BRl7EIsOqYGtWFIua5JdrShMGuhUo50%3D&reserved=0.

ducatiMonster916 commented 4 years ago

Hi Rick,

So the support for CWL was added into the main branch a few hours ago as it turns out! I added the updated documentation in my private fork for documentation updates. I’ll issue a PR now for it to be incorporated into the master branch.

In the interim, I’ve added the guidance below:

Running CWL Workflows on Cromwell on Azure Running workflows crafted in the Common Workflow Language(CWL) format is possible with a few modifications to your workflow submission.

  1. Ensure your dependencies are accessible by Cromwell Any additional scripts or subworkflows must be accessible to TES. They can be provided in 3 ways:

  2. Ensure your runtime resource requests are specified with the same names as WDL files CWL files sometimes contain differing runtime parameter names than what's acceptable by TES. Please refer to our guidehttps://github.com/microsoft/CromwellOnAzure/blob/master/docs/managing-your-workflow.md/#how-to-prepare-a-workflow-description-language-wdl-file-that-runs-a-workflow-on-cromwell-on-azure for proper guidance.

  3. Known issue for CWL files: Cannot request specific HDD size Unfortunately, this is actually a bug in how Cromwell currently parses the CWL file- and thus must be addressed in the Cromwell source code directly. We have submitted an issue to the Broad to have this addressed. The current workaround for this is to increase the number of vCPUs or Memory requested for a task, which will indirectly increase the amount of working disk space available. However, because this may cause inconsistent performance, we advise that if you are running a task that might consume a large amount of local scratch space, consider converting your workflow to the WDL format instead.

-Roberto


Roberto Lleras

Senior Applications Scientist | Microsoft Genomics, Microsoft Healthcare NeXT

From: Roberto Lleras Sent: Thursday, May 7, 2020 10:27 AM To: microsoft/CromwellOnAzure reply@reply.github.com; microsoft/CromwellOnAzure CromwellOnAzure@noreply.github.com Cc: Assign assign@noreply.github.com Subject: RE: [microsoft/CromwellOnAzure] How to run CWL flows (#65)

Hi Rick,

My apologies. So the FAQ will accompany the next release as we encountered some other things during final testing that required additional code fixes to address to make the functionality more seamless (you’ll still potentially have some issues without the code in the upcoming release). Let me check with the dev team to get an idea on their timeline for the next release. Will update you. shortly.

-Roberto


Roberto Lleras

Senior Applications Scientist | Microsoft Genomics, Microsoft Healthcare NeXT

From: rick-ji notifications@github.com<mailto:notifications@github.com> Sent: Wednesday, May 6, 2020 11:05 PM To: microsoft/CromwellOnAzure CromwellOnAzure@noreply.github.com<mailto:CromwellOnAzure@noreply.github.com> Cc: Roberto Lleras Roberto.Lleras@microsoft.com<mailto:Roberto.Lleras@microsoft.com>; Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [microsoft/CromwellOnAzure] How to run CWL flows (#65)

Hi Roberto,

Is that FAQ still WIP? I couldn't see that in the two branches in this repo, or am I missing it?

RIck

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FCromwellOnAzure%2Fissues%2F65%23issuecomment-625047880&data=02%7C01%7CRoberto.Lleras%40microsoft.com%7C8bf5eefeeaec4e46a29808d7f24ca1f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637244283257511172&sdata=%2FoaHgv%2BA9fCym7Yyz%2FNPOLKQThWjrbspbd94iZ1HqZU%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAOR4GKH3UAF76TBT5ZEPSR3RQJFSJANCNFSM4MHQ5RCA&data=02%7C01%7CRoberto.Lleras%40microsoft.com%7C8bf5eefeeaec4e46a29808d7f24ca1f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637244283257511172&sdata=vDAa0ajpdDf1%2BRl7EIsOqYGtWFIua5JdrShMGuhUo50%3D&reserved=0.

tonybendis commented 4 years ago

Update for Cromwell on Azure version 2.0:

For CWL workflows, all CWL resource keywords are supported, plus preemptible (not in CWL spec). Preemptible defaults to true (set in Cromwell configuration file), so use preemptible only if setting it to false (run on dedicated machine). TES keywords are also supported in CWL workflows, but we advise users to use the CWL ones.

CWL keywords: (CWL workflows only) coresMin: number ramMin: size in MB tmpdirMin: size in MB outdirMin: size in MB (the final disk size is the sum of tmpDir and outDir values)

TES keywords: (both CWL and WDL workflows) cpu: number memory: size unit disk: size unit preemptible: true|false

jbagga commented 4 years ago

Added to docs https://github.com/microsoft/CromwellOnAzure/blob/master/docs/troubleshooting-guide.md#setup-cromwell-on-azure-for-multiple-users-in-the-same-azure-subscription