Open jibsch opened 7 years ago
Hi Jan,
That definitely shouldn't happen. Are you still using a reference genome with unusual assembly names to perform the alignment, and did you manually build the GATC fragment file from that assembly if so?
Cheers, Owen
I was suspecting that myself, so I trimmed down the reference. The reference genome contains only the single chromosome. I can run the same data in single-end mode and the issue does not arise, so it's very likely in the code relating to paired-end data. Cheers, Jan
Interesting. I assume you're aligning the paired-end reads yourself (since the pipeline doesn't (yet) do this). Are you also aligning the single-end reads similarly, or letting the pipeline do it?
The single end reads are aligned prior to the pipeline as well, using bowtie 2.
On 27 Jun 2017 6:06 pm, "Owen Marshall" notifications@github.com wrote:
Interesting. I assume you're aligning the paired-end reads yourself (since the pipeline doesn't (yet) do this). Are you also aligning the single-end reads similarly, or letting the pipeline do it?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/owenjm/damidseq_pipeline/issues/7#issuecomment-311284797, or mute the thread https://github.com/notifications/unsubscribe-auth/AFEorh-5IfS4VF5qV87Mmka0o7d2kMQBks5sILgcgaJpZM4ODH7c .
As it happens I've just generated some PE reads, so I'll look into this over the next day or so and see if I can reproduce the error. Paired-end compatibility has always been experimental, since most people will not have the money for PE seq (there's very little advantage to using PE reads in practice, in my experience at least, for most use-cases).
Hi Owen,
I'm trying to run the pipeline on DamID-seq samples, but the runs cannot complete due to the server running out of memory.
Even extracting a single chromosome (dm6 chr3R) from the data will result in a memory blowout. The memory usage progresses non-linearly starting from processing the Dam-protein sample. See the attached plot to illustrate this point.
Note that the data are paired-end reads.
The pipeline invocation is as follows:
~/tools/damidseq_pipeline-1.4/damidseq_pipeline --bowtie2_genome_dir=~/references/drosophila/dm6_bowtie2_index --gatc_frag_file=~/references/drosophila/dm6_GATC.gff --dam=Dam1_chr3R.bam DamXX_chr3R.bam
Regards, Jan