owenjm / damidseq_pipeline

An automated pipeline for processing DamID sequencing datasets
http://owenjm.github.io/damidseq_pipeline
GNU General Public License v2.0
15 stars 9 forks source link

Excessive memory usage #7

Open jibsch opened 7 years ago

jibsch commented 7 years ago

Hi Owen,

I'm trying to run the pipeline on DamID-seq samples, but the runs cannot complete due to the server running out of memory.

Even extracting a single chromosome (dm6 chr3R) from the data will result in a memory blowout. The memory usage progresses non-linearly starting from processing the Dam-protein sample. See the attached plot to illustrate this point. image

Note that the data are paired-end reads.

The pipeline invocation is as follows: ~/tools/damidseq_pipeline-1.4/damidseq_pipeline --bowtie2_genome_dir=~/references/drosophila/dm6_bowtie2_index --gatc_frag_file=~/references/drosophila/dm6_GATC.gff --dam=Dam1_chr3R.bam DamXX_chr3R.bam

Regards, Jan

owenjm commented 7 years ago

Hi Jan,

That definitely shouldn't happen. Are you still using a reference genome with unusual assembly names to perform the alignment, and did you manually build the GATC fragment file from that assembly if so?

Cheers, Owen

jibsch commented 7 years ago

I was suspecting that myself, so I trimmed down the reference. The reference genome contains only the single chromosome. I can run the same data in single-end mode and the issue does not arise, so it's very likely in the code relating to paired-end data. Cheers, Jan

owenjm commented 7 years ago

Interesting. I assume you're aligning the paired-end reads yourself (since the pipeline doesn't (yet) do this). Are you also aligning the single-end reads similarly, or letting the pipeline do it?

jibsch commented 7 years ago

The single end reads are aligned prior to the pipeline as well, using bowtie 2.

On 27 Jun 2017 6:06 pm, "Owen Marshall" notifications@github.com wrote:

Interesting. I assume you're aligning the paired-end reads yourself (since the pipeline doesn't (yet) do this). Are you also aligning the single-end reads similarly, or letting the pipeline do it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/owenjm/damidseq_pipeline/issues/7#issuecomment-311284797, or mute the thread https://github.com/notifications/unsubscribe-auth/AFEorh-5IfS4VF5qV87Mmka0o7d2kMQBks5sILgcgaJpZM4ODH7c .

owenjm commented 7 years ago

As it happens I've just generated some PE reads, so I'll look into this over the next day or so and see if I can reproduce the error. Paired-end compatibility has always been experimental, since most people will not have the money for PE seq (there's very little advantage to using PE reads in practice, in my experience at least, for most use-cases).