Open leoisl opened 1 year ago
As an example, here is the cluster size distribution for a HiSeq 2000 run with 75bp reads
176 1
194 2
324 3
399 4
647 5
927 6
2747 7
5190 8
5987 9
5047 10
2236 11
727 12
328 13
135 14
51 15
6 16
and now a 250bp Illumina sample for the same region
81806 1
22335 2
1485 3
693 4
382 5
374 6
455 7
520 8
434 9
487 10
441 11
539 12
541 13
643 14
504 15
615 16
696 17
578 18
713 19
698 20
641 21
723 22
674 23
728 24
673 25
717 26
697 27
749 28
746 29
767 30
761 31
836 32
949 33
1312 34
1495 35
1772 36
2021 37
2358 38
2402 39
2235 40
1875 41
1358 42
848 43
666 44
518 45
355 46
271 47
148 48
150 49
41 50
35 51
10 52
5 53
17 54
2 55
7 56
9 57
7 58
2 59
We could either automatically choose a cluster size threshold or at least provide cluster size histogram for user. Right now cluster sizes can be retrieved by parsing debugging files, but it might be worth it to upgrade it to a histogram and created by default? See https://github.com/mbhall88/drprg-paper/issues/2