kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
161 stars 81 forks source link

Bowtie and dedup options #60

Closed rbronste closed 7 years ago

rbronste commented 7 years ago

Hi,

A couple of questions:

  1. In the alignment I usually add -- no-mixed --no-discordant to my Bowtie2 settings, how can I go about doing this in the pipeline?
  2. Is the duplicate and mitochondrial read removal by Picard done in one step, trying to separate the two and just do a duplicate removal.

Thanks.

Rob.

leepc12 commented 7 years ago

Hello Rob,

  1. I just added -extra_param_bwt2 "[EXTRA_PARAMS]" so that you can freely add any additional parameters that you want to use. Please don't forget to quote [EXRTA_PARAMS].

  2. Yes, chrM is also removed when duplicates are removed. Please see https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/postalign_bam.bds#L210 (for SE data set) and https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/postalign_bam.bds#L351 (for PE). Please remove | grep -v 'chrM'. But chrM will still be removed in the final stage of the pipeline (IDR, naive-overlap and blacklist filtering).

If you want to keep all chrM, please remove all grep -v "chrM" commands from the following files.

https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/postalign_bam.bds https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/postalign_bed.bds .

And then replace grep -P 'chr[\dXY]+[ \t]' with grep -P 'chr[\dMXY]+[ \t]' for the following files.

https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/callpeak_idr.bds . https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/callpeak_naive_overlap.bds . https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/callpeak_blacklist_filter.bds .

Thanks,

Jin

On Wed, Jul 19, 2017 at 5:12 AM, rbronste notifications@github.com wrote:

Hi,

A couple of questions:

  1. In the alignment I usually add -- no-mixed --no-discordant to my Bowtie2 settings, how can I go about doing this in the pipeline?
  2. Is the duplicate and mitochondrial read removal by Picard done in one step, trying to separate the two and just do a duplicate removal.

Thanks.

Rob.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_MNjaFuLff-E8mpKqDwyZ7MnYie2ks5sPfKpgaJpZM4Oco-i .

rbronste commented 7 years ago

Hi Jin,

Quick question, ran the pipeline in the following way and got the following error after each replicate stage (3 PE replicates) finished, and wound up with no QC:

bds atac.bds -species mm10 -nth 16 -mapq_thresh 20 -no_dup_removal 1 -chrsz /sonas-hs/tollkuhn/hpc_norepl/home/rbronste/mm10.chrom.sizes -gensz mm

Gpr.reader(565): File not found '/mnt/grid/tollkuhn/hpc_norepl/home/data/rbronste/atac_dnase_pipelines/out/qc/rep1/298765_S1_R1_001_dedup.trim.PE2SE.nodup.pbc.qc' Fatal error: modules/log_parser.bds, line 244, pos 34. Trying to access element number 0 from list 'arr' (list size: 0).

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Fatal error: modules/ENCODE_accession.bds, line 184, pos 25. Map 'map_pbc' does not have key 'PBC1'.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Fatal error: atac.bds, line 385, pos 2. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.