uec / Issue.Tracker

Automatically exported from code.google.com/p/usc-epigenome-center
0 stars 0 forks source link

howto: submit a workflow based upon files (external analysis) #810

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
cd /export/uec-gs1/laird/shared/production/ga/external_analysis

make a directory for your project (Ex: LijingAnalysis)
cd to ZackExome
mkdir a dir for your experiment (Ex: chipseq)
cd chipseq

make sure your fastq are either moved to this dir, or symlinked here 

create a parameter file for your fastqs

no weird characters or spaces allowed
FlowCellName is unique ID to describe the samples that are in this processing 
run
queue should be laird
ClusterSize should be 1
the rest of the header can be ignored and used as is.

SamplID is a unique ID for each sample, no weird characters or spaces allowed
Lane can be ignored since you are not running sequencing here, just set it to 
any value from 1-8
Workflow can be chipseq, bilulfite, bisulfite-nome, bisulfite-rrbs, rnaseqv2, 
regular   (regular is basically wgs)
Reference is the fastq genome. for human bisulfite and TCGA we recommend 
/home/uec-00/shared/production/genomes/hg19_rCRSchrm/hg19_rCRSchrm.fa

example file workFlowParams.txt
ClusterSize = 1
queue = laird
FlowCellName = NEURAL   
MinMismatches = 2
MaqPileupQ = 30
referenceLane = 1
randomSubset = 300000

#Sample: USC557-SEP012GCCAAT
Sample.1.SampleID = USC557-SEP012GCCAAT
Sample.1.Lane = 1
Sample.1.Input = USC557-SEP012_GCCAAT_L003_R1.fastq.gz
Sample.1.Workflow = chipseq
Sample.1.Reference = 
/home/uec-00/shared/production/genomes/encode_hg19_mf/male.hg19.fa

#Sample: USC558-SEP102CTTGTA
Sample.2.SampleID = USC558-SEP102CTTGTA
Sample.2.Lane = 1
Sample.2.Input = USC558-SEP102_CTTGTA_L003_R1.fastq.gz
Sample.2.Workflow = chipseq
Sample.2.Reference = 
/home/uec-00/shared/production/genomes/encode_hg19_mf/male.hg19.fa

#Sample: USC559-SEP109ACTTGA
Sample.3.SampleID = USC559-SEP109ACTTGA
Sample.3.Lane = 1
Sample.3.Input = USC559-SEP109_ACTTGA_L003_R1.fastq.gz
Sample.3.Workflow = chipseq
Sample.3.Reference = 
/home/uec-00/shared/production/genomes/encode_hg19_mf/male.hg19.fa

I've created a tool to auto generate a param file given a directory full of 
fastqs:

ex:
/auto/uec-00/ramjan/devel/create_Workflow_Params/createParamFromFiles.pl 
workflowParam   *.fastq
Note: if PE, then only list the R1, the R2 will be detected:
/auto/uec-00/ramjan/devel/create_Workflow_Params/createParamFromFiles.pl 
workflowParam   *R1*.fastq

once you have a param file then you can submit it with 

/home/uec-00/shared/production/software/ECWorkflow/submitWorkflow.pl 
workFlowParams.txt
where workFlowParams.txt is the name of the param file that was created by hand 
or by my tool

once processing is complete, you should be able to see it on ECDP or manually 
browse the results

Original issue reported on code.google.com by zack...@gmail.com on 15 Sep 2014 at 10:38

GoogleCodeExporter commented 8 years ago

Original comment by zack...@gmail.com on 15 Sep 2014 at 10:39