zanglab / SICER2

MIT License
20 stars 15 forks source link

IndexError while "Determining the score threshold from random background..." #18

Open arhickman opened 2 years ago

arhickman commented 2 years ago

Hi! Thanks for all of your work building SICER. It's our peak caller of choice, especially for broad domains! I occasionally run into an error in the step: "Determining the score threshold from random background..." The entire output is pasted below. Any guidance is much appreciated!


Starting peak-calling, peak annotation, and FRiP AV004_2xLANAMeCP2_350nM_E0045-1_CR_K562_N__500k_Kmet__S1_uniq.bam Sicer-broad peak calling Running SICER with given arguments

Preprocess the AV004_2xLANAMeCP2_350nM_E0045-1_CR_K562_N__500k_Kmet__S1_uniq.bed file to remove redundancy with threshold of 1


chrom Total plus reads Retained plus reads Total minus reads Retained minus reads

chr1 4659751 4403404 4659747 4403459
chr2 3275483 3151590 3275495 3151892
chr3 2574122 2471084 2574126 2470966
chr4 2184572 2114932 2184597 2114927
chr5 2490561 2391973 2490530 2392923
chr6 2750047 2621032 2749993 2621540
chr7 2993344 2828244 2993352 2827611
chr8 1967204 1892082 1967198 1892322
chr9 1662519 1565154 1662534 1565118
chr10 1925155 1842103 1925155 1842225
chr11 2079889 1982595 2079874 1982890
chr12 1859203 1781270 1859196 1781042
chr13 985142 938199 985108 938118
chr14 1070885 1035002 1070894 1034275
chr15 1445601 1382636 1445610 1382387
chr16 1620834 1524772 1620834 1525646
chr17 1487914 1403445 1487901 1403183
chr18 926987 897584 926987 897638
chr19 1410887 1301552 1410888 1302178
chr20 989479 942301 989479 942384
chr21 723933 685449 723932 685376
chr22 1163571 1029414 1163567 1029121
chrX 1306262 1274004 1306267 1274110
chrY 38804 37741 38804 37743
chrM 789367 58624 789367 58385

Preprocess the 22AV001_IgG_E0045-1_CR_K562_C__500k_Kmet_5ng_S1_uniq.bed file to remove redundancy with threshold of 1


chrom Total plus reads Retained plus reads Total minus reads Retained minus reads

chr1 387769 387023 387768 387005
chr2 262420 261906 262423 261876
chr3 200398 200161 200399 200173
chr4 175771 175584 175772 175561
chr5 203630 203247 203625 203235
chr6 218956 218630 218953 218626
chr7 236811 236435 236813 236434
chr8 157468 157252 157469 157240
chr9 137318 136565 137318 136582
chr10 159388 159022 159388 159034
chr11 164132 163857 164131 163883
chr12 145313 145115 145313 145136
chr13 86965 84938 86967 84919
chr14 87777 87649 87777 87666
chr15 126483 126229 126484 126288
chr16 134352 134111 134352 134117
chr17 118459 118166 118458 118160
chr18 79787 79716 79789 79719
chr19 110763 110286 110764 110273
chr20 80781 80616 80781 80621
chr21 64787 64637 64786 64644
chr22 98695 98341 98694 98285
chrX 103861 103749 103860 103754
chrY 2667 2658 2667 2658
chrM 135612 20667 135612 19999

Partition the genome in windows and generate summary files...

Total count of chr1 tags: 8806863 Total count of chr2 tags: 6303482 Total count of chr3 tags: 4942050 Total count of chr4 tags: 4229859 Total count of chr5 tags: 4784896 Total count of chr6 tags: 5242572 Total count of chr7 tags: 5655855 Total count of chr8 tags: 3784404 Total count of chr9 tags: 3130272 Total count of chr10 tags: 3684328 Total count of chr11 tags: 3965485 Total count of chr12 tags: 3562312 Total count of chr13 tags: 1876317 Total count of chr14 tags: 2069277 Total count of chr15 tags: 2765023 Total count of chr16 tags: 3050418 Total count of chr17 tags: 2806628 Total count of chr18 tags: 1795222 Total count of chr19 tags: 2603730 Total count of chr20 tags: 1884685 Total count of chr21 tags: 1370825 Total count of chr22 tags: 2058535 Total count of chrX tags: 2548114 Total count of chrY tags: 111 Total count of chrM tags: 116948

Normalizing graphs by total island filitered reads per million and generating summary WIG file...

Finding candidate islands exhibiting clustering...

Species: T2Tv2 Window_size: 5000 Gap size: 50000 E value is: 5000 Total read count: 83024186 Genome Length: 3054837841 Effective genome Length: 2260580002 Window average: 183.6346997817952 Window pvalue: 0.2 Minimum num of tags in a qualified window: 196

Determining the score threshold from random background... Removing temporary directory and all files in it. Traceback (most recent call last): File "/home/bryan/miniconda3/bin/sicer", line 7, in exec(compile(f.read(), file, 'exec')) File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/bin/sicer", line 235, in main() File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/bin/sicer", line 229, in main run_SICER.main(args) File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/sicer/main/run_SICER.py", line 77, in main find_islands_in_pr.main(args, total_tag_in_windows, pool) File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/sicer/src/find_islands_in_pr.py", line 184, in main score_threshold = background.find_island_threshold(args.e_value); File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/sicer/lib/Background_island_probscore_statistics.py", line 225, in find_island_threshold current_expectation = self.background_island_expectation(current_scaled_score) File "/media/bryan/Alli_Main/file_transfer/AV/SICER2/sicer/lib/Background_island_probscore_statistics.py", line 172, in background_island_expectation while (int(round(index - self.window_score[i] / self.bin_size)) >= 0): IndexError: list index out of range

huguesrichard commented 2 years ago

Hello @arhickman,

I have exactly the same error when running sicer. After digging a little into the code it seems that the problem comes from the precomputation the poisson probability table in the sicer/lib/Background_island_probscore_statistics.py file. The precomputation is only done up to 2 times the expectation value using the self.max_index variable (see here)

self.max_index = max(500, int(2 * self.average));

With bigger sequence libraries, it occurs that some observations are above this count. To avoid that I did a quick hack by changing the factor 2 by a factor 5 and it worked for me, e.g.:

self.max_index = max(500, int(5 * self.average));

I guess one could either adapt this value to the observed sequence each time or add a check for self.max_index in the background_island_expectation function.

Thanks @zanglab for providing such a practical tool.

akukalev commented 1 year ago

With bigger sequence libraries, it occurs that some observations are above this count. To avoid that I did a quick hack by changing the factor 2 by a factor 5 and it worked for me, e.g.:

I experience the same issue with certain gap sizes, but unfortunately this solution does not work for me. I tried to change factor 2 to 5 or 10 or 50 and always get the same error :(

JasonJiangs commented 7 months ago

With bigger sequence libraries, it occurs that some observations are above this count. To avoid that I did a quick hack by changing the factor 2 by a factor 5 and it worked for me, e.g.:

I experience the same issue with certain gap sizes, but unfortunately this solution does not work for me. I tried to change factor 2 to 5 or 10 or 50 and always get the same error :(

Hi, thanks for the error reported and temporarily solution! We are currently updating a new version of SICER2 and looking into those problems. Can you let us know the exact gap size that you used in your work? @akukalev