Closed POTsnake closed 5 years ago
Thanks for the questions. Here are the answers/suggestions:
"Killed: 9"
is an error produced by your computer, not by Genrich. It might be due to a memory limit, but I cannot say for sure.-X
and -P
. This will save a little memory, but perhaps not enough to make a difference.-r
). This is not recommended but will save a lot of memory. (At this point, there is no option to remove PCR duplicates only with Genrich, but perhaps there should be...)Read xxx prevented...
warning messages relate to extending intervals past chromosome ends, as described here. These warnings do not affect memory, and do not cause any other problems (other than being annoying, if there are a lot of them -- perhaps this should be limited). You should not need to remove reads aligning to chrUn_
.-f
, it is simple to explore the effects of altering the -p
/-q
threshold (as well as -a
/-l
/-g
) with -P
, as described here and here.I had the same problem. I tried running without -r, and it worked! I wonder whether I should remove the PCR duplicates before Genrich (by Picard maybe?). Thank you very much for the helpful package and advice!-Weiwei
@weiweivi,
Again, running without -r
is not recommended. You can certainly try Picard, although I do not think you will get far, if memory is your limitation. In my experience MarkDuplicates
uses about twice as much memory as Genrich (and it doesn't even call peaks!).
Hi both,
Thank you so much for your comments. I had a discussion with my colleague. For the memory problem (related to my first question), instead of removing -r I am going to try:
Do you think the sum of the results obtained by splitting and analysing the parts could be different from the result obtained by analysing the original file? Let's say that 10,000 peaks on each chr1-10 and chr11-22+XY were identified from the original file. Is it possible to run Genrich on the split files with chr1-10 and chr11-22+XY separately, and get the same 10,000 peaks from each? Are there any better ways of splitting than based on chromosomes?
The results from splitting the BAM and analyzing the parts separately will certainly be different from doing it all at once. The reason is that the background level(s) and genome length(s) will be different, even if you are very careful to modify the BAM headers for each part artificially. If your BAM has secondary alignments that span multiple chromosomes, it would be quite impossible.
I don't see any good way of splitting the BAM. I recommend finding a machine with more memory. Maybe try the cloud?
Just FYI, I updated the code to limit the Read xxx prevented...
warning messages to 128 per input file. This should allow the program to run faster and not clog up the console, but it does not improve the memory issue.
Hi, jsh58. Thank you for your messages and sorry for my late reply. I switched off other memory-taking applications when running Genrich and it could process files as large as 10GB with my 16GB-RAM MBP. Also, inputs of 4 files at the same time (-t A,B,C,D) were no problem and increased identified peak numbers.
I have 4 data sets (replica x sequence lanes) for each condition like: replica1_lane1, replica1_lane2, replica2_lane1, replica2_lane2 / condition1. Which is better to input these 4 files together to Genrich or replica1s and 2s separately? At the end, I would like to have comparisons between different conditions.
Glad to hear it worked! With multiple input files, PCR duplicate removal is done separately for each, so the memory increase should be minimal.
If that concludes the memory discussion, I would appreciate it if you would close this issue, and then open a new issue for your question about replicates. Thanks!
Sure. I would also try to find other threads related to the replicates. Thank you so much for your help.
First of all, I do appreciate your developing and revealing the very elaborate and useful code for peak calling. I am trying to use Genrich to analyze my ATAC-seq data sets with my MacBook Pro. I executed the following command: $ Genrich -t A -o B -j -y -r -e chrM -v A was "samtools sort -n" bam files, and originally cram files. When A was relatively small (4.45GB or 8.84GB), Genrich completed the processes after giving many warning messages. However, when I treated bigger files like 17GB, the the processes were terminated by "Killed: 9" after showing lots of Warnings! for 2.5hrs. I saw two kinds of warnings from my treatment: 1) Read xxx prevented from extending below 0 on chr yyy 2) Read xxx prevented from extending past ... on chr yyy Most of the problematic chromosomes were chrUn_’something’.
Do you have any ideas to deal with the bigger files? Should I remove the reads aligned on chrUn_’something’ from the input files? Also, completed analyses with the default setting identified less peaks than expected. What is the reasonable range of values of p and q options for ATAC-seq analyses?
Again thank you so much for your devotion to the field and I would be even more grateful if you could help me on these issues.