nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Forward and reverse reads not paired. #369

Closed linshengnan2020 closed 3 years ago

linshengnan2020 commented 4 years ago

hi, I run HiC-Pro in my data and I met a error like this: /home/linshengnan/00_bin/python /home/linshengnan/01_software/hic-pro/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/sample1/mh30_1_R1 [E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam'

mergeBAM.py

forward= bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam

reverse= bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam

output= bowtie_results/bwt2/sample1/mh30_1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2pairs.bam

min mapq= 10

report_single= False

report_multi= False

verbose= True

Merging forward and reverse tags ...

Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted. mergeSAM.log (END)

could you give me some advise ? Thank you very much.

nservant commented 4 years ago

Hi, Could you just look at the first 10 lines of both bam files and check if they are sorted in the same way (by name) with the exact same read name ? After the mapping, the bam files are sorted. If for any reason the sort has failed, it might explain why the two bam files are not ordered in the same way. And the message [E::idx_find_and_load] Could not retrieve index file looks weird ... so I'm wondering if the sort works well. Best

linshengnan2020 commented 4 years ago

I sorted the bam file myself and run the mergeSAM.py , it works well . Did I have a configuration problem during the installation? How can I solve this problem? Thank you very much.

nservant commented 4 years ago

No I think that this is more a RAM issue. Samtools sort has crashed because it doesn't have enough memory at some point.

linshengnan2020 commented 4 years ago

hi, when I run the mergeSAM.py , an error has occurred: [E::idx_find_and_load] Could not retrieve index file for 'mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for 'mh30_2_clean_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' I tried to build bam index file by samtools index , here occurred an error: [E::hts_idx_push] NO_COOR reads not in a single block at the end 688 -1 [E::sam_index] Read 'A00358:332:HFTKGDSXY:2:1101:1081:2487' with ref_name='ctg000430', ref_length=9029235, flags=0, pos=8917145 cannot be indexed samtools index: failed to create index for "mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam": No such file or directory Thank you very much.

nservant commented 4 years ago

samtools index: failed to create index for "mh30_2_clean_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam": No such file or directory

Is there something wrong in the path ? does the file exist ?

linshengnan2020 commented 4 years ago

I plus the absolute path of bam file and run mergeSAM.py again: /home/linshengnan/01_software/hic-pro/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam -r /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam -o mh30_1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2pairs.bam the log :

mergeBAM.py

forward= /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam

reverse= /home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam

output= mh30_1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2pairs.bam

min mapq= 10

report_single= False

report_multi= False

verbose= True

Merging forward and reverse tags ...

[E::idx_find_and_load] Could not retrieve index file for '/home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam' [E::idx_find_and_load] Could not retrieve index file for '/home/linshengnan/03_work/00_dianthus_work/00_30mh_genome/04_hic/00_hicpro/hic-pro-result/bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam'

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

10000000

11000000

12000000

13000000

14000000

15000000

16000000

17000000

18000000

19000000

20000000

21000000

22000000

23000000

24000000

25000000

26000000

27000000

28000000

29000000

30000000

31000000

32000000

33000000

34000000

35000000

36000000

37000000

38000000

39000000

40000000

41000000

42000000

43000000

44000000

45000000

46000000

47000000

48000000

49000000

50000000

51000000

52000000

53000000

54000000

55000000

56000000

57000000

58000000

59000000

60000000

61000000

62000000

63000000

64000000

65000000

66000000

67000000

68000000

69000000

70000000

71000000

72000000

73000000

74000000

75000000

76000000

77000000

78000000

79000000

80000000

81000000

82000000

83000000

84000000

85000000

86000000

87000000

88000000

89000000

90000000

91000000

92000000

93000000

94000000

95000000

96000000

97000000

98000000

99000000

100000000

101000000

102000000

103000000

104000000

105000000

106000000

107000000

108000000

109000000

110000000

111000000

112000000

113000000

114000000

115000000

116000000

117000000

118000000

119000000

120000000

121000000

122000000

123000000

124000000

125000000

126000000

127000000

128000000

....

292000000


the result also generated bwt2pairs.bam and bwt2pairs.pairstat

linshengnan2020 commented 3 years ago

I'm sorry to bother you again,but the problem has not been solved. As you said, this is a RAM issue. samtool sort has a problem. But I set the sorted memory to 2000G, there also met the same error: mapping_combine.log /home/chenxinxiu/software/samtools-1.8/samtools merge -@ 72 -n -f bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam bowtie_results/bwt2_global/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.bam bowtie_results/bwt2_local/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.unmap_bwt2loc.bam /home/chenxinxiu/software/samtools-1.8/samtools merge -@ 72 -n -f bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam bowtie_results/bwt2_global/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.bam bowtie_results/bwt2_local/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2glob.unmap_bwt2loc.bam /home/chenxinxiu/software/samtools-1.8/samtools sort -@ 72 -m 2000G -n -T tmp/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa -o bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.sorted.bam bowtie_results/bwt2/sample1/mh30_1_R1_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam samtools sort: couldn't allocate memory for bam_mem /home/chenxinxiu/software/samtools-1.8/samtools sort -@ 72 -m 2000G -n -T tmp/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa -o bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.sorted.bam bowtie_results/bwt2/sample1/mh30_1_R2_nextgraph_pilon_5_1.filter.no.plastid.fa.bwt2merged.bam samtools sort: couldn't allocate memory for bam_mem


How can I solve this problem, could you give me some advise? Thank you very much!

hanbinlu commented 3 years ago

Hi,

I got the same issue. Looking at the samtools sort mannual, the -m option is counted as memory max per thread. So the usage of memory is actually max_memoryn_threads. As far as I can tell, the previous update changed N_CPU to total CPU use limit. It does use N_CPU/2 for mapping for each end of reads. However, it is still useing N_CPU per end in mergeSAM step, which is causing 2 times more memory usage. In summary, if I am guessing right, the total memory use is SORT_RAM*N_CPU\2.

nservant commented 3 years ago

Hi, Many thanks for your comments. @sf-nevermore, so you think @linshengnan2020 should decrease the RAM limit given to samtools (-m option) ?

hanbinlu commented 3 years ago

Yes. Divide to the factor of N_CPU. Intuitively, I think HiC-Pro should use -m SORT_SAM/N_CPU at the samtools sorting step.

linshengnan2020 commented 3 years ago

yes. I check the config-hicpro.txt of version 2.11.1, there is no SORT_RAM option, and I remove this option, it works well.

nservant commented 3 years ago

Thanks guys. Very useful !

Ferossitiziano commented 3 years ago

Hi!

I have removed the SORT_RAM option as suggested by linshengnan2020, but I keep getting the same error:

Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

The only difference is that I do not get the "samtools sort: couldn't allocate memory for bam_mem" error in mapping_combine.log.

Any idea on how I could fix this? Thanks, Federico

nservant commented 3 years ago

Hi If you remove the option, there is no RAM limit. I would therefore suggest using what @linshengnan2020 suggested, ie. using -m SORT_SAM/N_CPU Best

Ferossitiziano commented 3 years ago

Sorry to bother you again,

I have modified the bowtie_combine.sh script by replacing '-m SORT_RAM' with '-m SORT_RAM/N_CPU'. By doing that I went from 20G of RAM to ca. 3GB of RAM, and 'samtools sort' worked without throwing errors.

However, the outputs of 'samtools sort' were not properly sorted and I got an error in the mergeSAM step. Here is my log:

/hpcnfs/home/ieo5073/miniconda3/envs/HiCpro/bin/python /hpcnfs/home/ieo5073/miniconda3/envs/HiCpro/bin/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2mer
ged.bam -r bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam -o bowtie_results/bwt2/WT/WT_genome.bwt2pairs.bam
## mergeBAM.py
## forward= bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
## reverse= bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
## output= bowtie_results/bwt2/WT/WT_genome.bwt2pairs.bam
## min mapq= 10
## report_single= False
## report_multi= False
## verbose= True
## Merging forward and reverse tags ...
## 1000000
## 2000000
## 3000000
## 4000000
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.
hanbinlu commented 3 years ago

To validate actual memory setting in samtools sorting step, check the mapping_combin.log in the log folder. You should be able to see /usr/local/anaconda/bin/samtools sort -@ 38 -m 50M -n -T ... the -m value for your run and the error occurred during sorting.

Ferossitiziano commented 3 years ago

This is my mapping_combine.log

It seems to me it worked fine, but I'm probably wrong since the file were not sorted.

Any advice on now to fix that? Thanks

samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R2_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R2_genome.bwt2glob.unmap_bwt2loc.bam
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R1_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R1_genome.bwt2glob.unmap_bwt2loc.bam
samtools sort -@ 6 -m 3G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
samtools sort -@ 6 -m 3G -n -T tmp/WT_R1_genome -o bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
[bam_sort_core] merging from 0 files and 6 in-memory blocks...
[bam_sort_core] merging from 0 files and 6 in-memory blocks...
mv bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
mv bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
hanbinlu commented 3 years ago

Looks like they were sorted. Maybe your fastq input files are not paired?

nservant commented 3 years ago

Indeed. Could you please show us the first lines of your fastq files please ?

Ferossitiziano commented 3 years ago

Yes, sorry, my input fastq were not paired. Silly mistake. Thank you.

Ferossitiziano commented 3 years ago

After having checked for read pairing in input fastq, I succeeded with ~30M reads test files I used my whole dataset, that is ~300M reads, using the same settings.

The mergeSAM step failed, with the usual error message:

Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

I check the previous step, reported in mapping_combine.log (below),

samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R1_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R1_genome.bwt2glob.unmap_bwt2loc.bam
samtools merge -@ 6 -n -f bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam bowtie_results/bwt2_global/WT/WT_R2_genome.bwt2glob.bam bowtie_results/bwt2_local/WT/WT_R2_genome.bwt2glob.unmap_bwt2loc.bam
samtools sort -@ 6 -m 12G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
samtools sort -@ 6 -m 12G -n -T tmp/WT_R1_genome -o bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam
HiC-Pro_2.11.4/scripts/hic.inc.sh: line 86: 76351 Killed                  
samtools sort -@ 6 -m 12G -n -T tmp/WT_R2_genome -o bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R2_genome.bwt2merged.bam
[bam_sort_core] merging from 12 files and 6 in-memory blocks...
mv bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.sorted.bam bowtie_results/bwt2/WT/WT_R1_genome.bwt2merged.bam

and I saw that a certain step was killed.

HiC-Pro_2.11.4/scripts/hic.inc.sh: line 86: 76351 Killed 

I did not get this 'Killed' message when my test successfully run, so I think that's where the problem might be.

Could you please help me with that? Thank you.

Federico

hanbinlu commented 3 years ago

You were allowing to use up to 12G*6cores*2processes memory. When you ran with a small dataset, the whole BAM file can be fit in the memory so no error returns. But if you are running a large dataset, it will keep filling the memory till reach the preset upper limit which in your case more than the system memory.

nservant commented 3 years ago

This is now included in HiC-pro 3.0.0