Closed linshengnan2020 closed 3 years ago
Hi,
Could you just look at the first 10 lines of both bam files and check if they are sorted in the same way (by name) with the exact same read name ?
After the mapping, the bam files are sorted. If for any reason the sort
has failed, it might explain why the two bam files are not ordered in the same way.
And the message [E::idx_find_and_load] Could not retrieve index file
looks weird ... so I'm wondering if the sort works well.
Best
I sorted the bam file myself and run the mergeSAM.py , it works well . Did I have a configuration problem during the installation? How can I solve this problem? Thank you very much.
No I think that this is more a RAM issue. Samtools sort has crashed because it doesn't have enough memory at some point.
hi, when I run the mergeSAM.py , an error has occurred: [E::idx_find_and_load] Could not retrieve index file for 'mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for 'mh30_2_clean_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' I tried to build bam index file by samtools index , here occurred an error: [E::hts_idx_push] NO_COOR reads not in a single block at the end 688 -1 [E::sam_index] Read 'A00358:332:HFTKGDSXY:2:1101:1081:2487' with ref_name='ctg000430', ref_length=9029235, flags=0, pos=8917145 cannot be indexed samtools index: failed to create index for "mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam": No such file or directory Thank you very much.
samtools index: failed to create index for "mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam": No such file or directory
Is there something wrong in the path ? does the file exist ?
I plus the absolute path of bam file and run mergeSAM.py again: /home/linshengnan/01_software/hic-pro/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam -r /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam -o mh30_1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2pairs.bam the log :
[E::idx_find_and_load] Could not retrieve index file for '/home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for '/home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam'
....
the result also generated bwt2pairs.bam and bwt2pairs.pairstat
I'm sorry to bother you again,but the problem has not been solved. As you said, this is a RAM issue. samtool sort has a problem. But I set the sorted memory to 2000G, there also met the same error: mapping_combine.log /home/chenxinxiu/software/samtools-1.8/samtools merge -@ 72 -n -f bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam bowtie_results/bwt2_global/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.bam bowtie_results/bwt2_local/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.unmap_bwt2loc.bam /home/chenxinxiu/software/samtools-1.8/samtools merge -@ 72 -n -f bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam bowtie_results/bwt2_global/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.bam bowtie_results/bwt2_local/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.unmap_bwt2loc.bam /home/chenxinxiu/software/samtools-1.8/samtools sort -@ 72 -m 2000G -n -T tmp/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa -o bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.sorted.bam bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam samtools sort: couldn't allocate memory for bam_mem /home/chenxinxiu/software/samtools-1.8/samtools sort -@ 72 -m 2000G -n -T tmp/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa -o bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.sorted.bam bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam samtools sort: couldn't allocate memory for bam_mem
How can I solve this problem, could you give me some advise? Thank you very much!
Hi,
I got the same issue. Looking at the samtools sort mannual, the -m option is counted as memory max per thread. So the usage of memory is actually max_memoryn_threads. As far as I can tell, the previous update changed N_CPU to total CPU use limit. It does use N_CPU/2 for mapping for each end of reads. However, it is still useing N_CPU per end in mergeSAM step, which is causing 2 times more memory usage. In summary, if I am guessing right, the total memory use is SORT_RAM*N_CPU\2.
Hi, Many thanks for your comments. @sf-nevermore, so you think @linshengnan2020 should decrease the RAM limit given to samtools (-m option) ?
Yes. Divide to the factor of N_CPU. Intuitively, I think HiC-Pro should use -m SORT_SAM/N_CPU
at the samtools sorting step.
yes. I check the config-hicpro.txt of version 2.11.1, there is no SORT_RAM option, and I remove this option, it works well.
Thanks guys. Very useful !
Hi!
I have removed the SORT_RAM option as suggested by linshengnan2020, but I keep getting the same error:
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.
The only difference is that I do not get the "samtools sort: couldn't allocate memory for bam_mem" error in mapping_combine.log.
Any idea on how I could fix this? Thanks, Federico
Hi
If you remove the option, there is no RAM limit.
I would therefore suggest using what @linshengnan2020 suggested, ie. using -m SORT_SAM/N_CPU
Best
Sorry to bother you again,
I have modified the bowtie_combine.sh script by replacing '-m SORT_RAM' with '-m SORT_RAM/N_CPU'. By doing that I went from 20G of RAM to ca. 3GB of RAM, and 'samtools sort' worked without throwing errors.
However, the outputs of 'samtools sort' were not properly sorted and I got an error in the mergeSAM step. Here is my log:
/hpcnfs/home/ieo5073/miniconda3/envs/HiCpro/bin/python /hpcnfs/home/ieo5073/miniconda3/envs/HiCpro/bin/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2mer
ged.bam -r bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam -o bowtie_results/bwt2/WT/WT_genome.bwt2pairs.bam
## mergeBAM.py
## forward= bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
## reverse= bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
## output= bowtie_results/bwt2/WT/WT_genome.bwt2pairs.bam
## min mapq= 10
## report_single= False
## report_multi= False
## verbose= True
## Merging forward and reverse tags ...
## 1000000
## 2000000
## 3000000
## 4000000
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.
To validate actual memory setting in samtools sorting step, check the mapping_combin.log in the log folder. You should be able to see /usr/local/anaconda/bin/samtools sort -@ 38 -m 50M -n -T ...
the -m value for your run and the error occurred during sorting.
This is my mapping_combine.log
It seems to me it worked fine, but I'm probably wrong since the file were not sorted.
Any advice on now to fix that? Thanks
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R2_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R2_genome.bwt2glob.unmap_bwt2loc.bam
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R1_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R1_genome.bwt2glob.unmap_bwt2loc.bam
samtools sort -@ 6 -m 3G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
samtools sort -@ 6 -m 3G -n -T tmp/WT_R1_genome -o bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
[bam_sort_core] merging from 0 files and 6 in-memory blocks...
[bam_sort_core] merging from 0 files and 6 in-memory blocks...
mv bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
mv bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
Looks like they were sorted. Maybe your fastq input files are not paired?
Indeed. Could you please show us the first lines of your fastq files please ?
Yes, sorry, my input fastq were not paired. Silly mistake. Thank you.
After having checked for read pairing in input fastq, I succeeded with ~30M reads test files I used my whole dataset, that is ~300M reads, using the same settings.
The mergeSAM step failed, with the usual error message:
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.
I check the previous step, reported in mapping_combine.log (below),
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R1_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R1_genome.bwt2glob.unmap_bwt2loc.bam
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R2_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R2_genome.bwt2glob.unmap_bwt2loc.bam
samtools sort -@ 6 -m 12G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
samtools sort -@ 6 -m 12G -n -T tmp/WT_R1_genome -o bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
HiC-Pro_2.11.4/scripts/hic.inc.sh: line 86: 76351 Killed
samtools sort -@ 6 -m 12G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
[bam_sort_core] merging from 12 files and 6 in-memory blocks...
mv bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
and I saw that a certain step was killed.
HiC-Pro_2.11.4/scripts/hic.inc.sh: line 86: 76351 Killed
I did not get this 'Killed' message when my test successfully run, so I think that's where the problem might be.
Could you please help me with that? Thank you.
Federico
You were allowing to use up to 12G*6cores*2processes memory. When you ran with a small dataset, the whole BAM file can be fit in the memory so no error returns. But if you are running a large dataset, it will keep filling the memory till reach the preset upper limit which in your case more than the system memory.
This is now included in HiC-pro 3.0.0
hi, I run HiC-Pro in my data and I met a error like this: /home/linshengnan/00_bin/python /home/linshengnan/01_software/hic-pro/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/sample1/mh30_1_R1 [E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam'
mergeBAM.py
forward= bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam
reverse= bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam
output= bowtie_results/bwt2/sample1/mh30_1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2pairs.bam
min mapq= 10
report_single= False
report_multi= False
verbose= True
Merging forward and reverse tags ...
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted. mergeSAM.log (END)
could you give me some advise ? Thank you very much.