rworkflow / Rcwl

Write CWL in R
https://rcwl.org/Rcwl/
GNU General Public License v2.0
14 stars 4 forks source link

Add parsers for CWL documents #5

Closed lutsik closed 3 years ago

lutsik commented 3 years ago

Hi, is there a plan to add such functionality?

hubentu commented 3 years ago

Hi @lutsik , Yes, the function readCWL can parse CWL into R. It can read local or downloadable URL directly. For example,

> bwa <- readCWL("https://raw.githubusercontent.com/hubentu/RcwlRecipes/master/cwl/bwa.cwl")
> bwa
 class: cwlParam
  cwlClass: CommandLineTool
  cwlVersion: v1.0
  baseCommand: bwa mem
 requirements:
 - class: DockerRequirement
   dockerPull: biocontainers/bwa:v0.7.17-3-deb_cv1
 inputs:
   threads (int): -t
   RG (string): -R
   Ref (File):
   FQ1 (File):
   FQ2 (File?):
 outputs:
 sam:
   type: File
   outputBinding:
     glob: '*.sam'
 stdout: bwaOutput.sam

There is another more versatile function cwlLoad from the RcwlPipeline package. It can load CWL from id, URI and github path. For example, to load a workflow from a github repo.

> pf <- cwlLoad("common-workflow-library/bio-cwl-tools", cwlfile = "sratoolkit/prefetch_fastq.cwl")
> pf
class: cwlStepParam 
 cwlClass: Workflow 
 cwlVersion: v1.0 
requirements:
MultipleInputFeatureRequirement: {}
StepInputExpressionRequirement: {}
InlineJavascriptRequirement: {}
inputs:
  sra_accession (string):  
outputs:
fastq_files:
  format: edam:format_1931
  type: File[]
  outputSource: fastq_dump/all_fastq_files
fastq_file_1:
  format: edam:format_1931
  type: File
  outputSource: rename_fastq1/outfile
fastq_file_2:
  format: edam:format_1931
  type: File?
  outputSource: fastq_dump/fastq_file_2
steps:
  prefetch:
    run: prefetch.cwl
    accession: sra_accession
    out: sra_file
  fastq_dump:
    run: fastq_dump.cwl
    sra_file: prefetch/sra_file
    split_files: 
    out: all_fastq_files fastq_file_1 fastq_file_2
  rename_fastq1:
    run: rename_fastq1.cwl
    srcfile: fastq_dump/fastq_file_1
    fastq2: fastq_dump/fastq_file_2
    accession: sra_accession
    newname: 
    out: outfile
lutsik commented 3 years ago

Great, thank you!

lutsik commented 3 years ago

Sorry to reopen, but something went wrong with my first attempt:

Rcwl::readCWL("https://raw.githubusercontent.com/CompEpigen/ChIPseq_workflows/master/CWL/workflows/ChIPseq.cwl")

Error in (function (id, run = cwlParam(), In = stepInParamList(), Out = list(), : unused argument (doc = "multiqc summarizes the qc results from fastqc \nand other tools\n")

It seems like "doc" fields are not supported yet, can it be?

hubentu commented 3 years ago

Thanks for reporting the bug. Yes, there are some slots missing because the CWL specifications keep adding new fields. We will work on specification 1.2 to be compatible with more slots.

I just added doc for Subworkflows. Please test the latest updates.

BiocManager::install('hubentu/Rcwl')
Rcwl::readCWL("https://raw.githubusercontent.com/CompEpigen/ChIPseq_workflows/master/CWL/workflows/ChIPseq.cwl")

To make sure it can be run locally, you can try the loadCWL function. It will clone the whole repo for dependencies.

BiocManager::install('hubentu/RcwlPipelines')
RcwlPipelines::cwlLoad("CompEpigen/ChIPseq_workflows", cwlfile = "CWL/workflows/ChIPseq.cwl")
lutsik commented 3 years ago

OK, thanks for the quick fix, reading worked now. I will close the issue, but will continue testing.