Closed yk-tanigawa closed 7 years ago
For step5, I am downloading NA12878 data from Genome in a bottle consortia release v 3.3
Count match/mismatch:
-
not _
in maf file generated by LAST alignerLocation of results file from LAST mapping
[ytanigaw@sherlock-ln01 login_node ~]$ ll -h /share/PI/mrivas/data/nanopore/*.maf
-rw-r--r-- 1 ytanigaw mrivas 88M Oct 25 22:47 /share/PI/mrivas/data/nanopore/20161006_minion_human_cDNA.maf
-rw-r--r-- 1 ytanigaw mrivas 86M Oct 25 23:56 /share/PI/mrivas/data/nanopore/20161008_wgs_caucasian_48hr.maf
seq_id | aligned | insertion | deletion | soft clip | H | N | P | match | mismatch |
---|---|---|---|---|---|---|---|---|---|
1 | 14563 | 247 | 1940 | 15350 | 0 | 0 | 0 | 0 | 0 |
2 | 1134 | 46 | 129 | 46745 | 0 | 0 | 0 | 0 | 0 |
3 | 1732 | 40 | 173 | 31619 | 0 | 0 | 0 | 0 | 0 |
4 | 19430 | 641 | 2078 | 15 | 0 | 0 | 0 | 0 | 0 |
5 | 4290 | 157 | 485 | 21985 | 0 | 0 | 0 | 0 | 0 |
6 | 24919 | 1687 | 1856 | 68 | 0 | 0 | 0 | 0 | 0 |
7 | 19965 | 560 | 2113 | 51 | 0 | 0 | 0 | 0 | 0 |
Need to discriminate match/mismatch
seq_id | pos | len | gene | intron? |
---|---|---|---|---|
1 | 9162869 | 30160 | DENND5A | intron + exons |
2 | 3729772 | 47925 | NUP98 | intron |
3 | 70496710 | 33391 | SHANK2 | intron |
4 | 124630341 | 20086 | ESAM | exons (coding + 5' UTL) |
5 | 105992571 | 26432 | - | no annotated genes |
6 | 88688000 | 26674 | GRM5 | intron |
7 | 4484408 | 20576 | - | no annotated genes |
Many deletions.
id | = (match) | X (mismatch) | I (insertion) | D (deletion) | N | S (soft clip) | H | P |
---|---|---|---|---|---|---|---|---|
1 | 13810 | 753 | 247 | 1940 | 0 | 15350 | 0 | 0 |
2 | 303 | 831 | 46 | 129 | 0 | 46745 | 0 | 0 |
3 | 470 | 1262 | 40 | 173 | 0 | 31619 | 0 | 0 |
4 | 5104 | 14326 | 641 | 2078 | 0 | 15 | 0 | 0 |
5 | 3794 | 496 | 157 | 485 | 0 | 21985 | 0 | 0 |
6 | 6641 | 18278 | 1687 | 1856 | 0 | 68 | 0 | 0 |
7 | 5288 | 14677 | 560 | 2113 | 0 | 51 | 0 | 0 |
match/mismatches with Q-value
seq_id | = | (match) | X | (mismatch) | I | (insertion) | D | (deletion) | N | S | (soft clip) | H | P | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
>=20 | <20 | >=20 | <20 | >=20 | <20 | >=20 | <20 | >=20 | <20 | >=20 | <20 | >=20 | <20 | |||
1 | 1479 | 12331 | 13 | 740 | 13 | 234 | 1940 | 0 | 0 | 0 | 1280 | 14070 | 0 | 0 | 0 | 0 |
2 | 21 | 282 | 49 | 782 | 1 | 45 | 129 | 0 | 0 | 0 | 695 | 46050 | 0 | 0 | 0 | 0 |
3 | 57 | 413 | 107 | 1155 | 2 | 38 | 173 | 0 | 0 | 0 | 834 | 30785 | 0 | 0 | 0 | 0 |
4 | 481 | 4623 | 1195 | 13131 | 27 | 614 | 2078 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 0 |
5 | 312 | 3482 | 4 | 492 | 5 | 152 | 485 | 0 | 0 | 0 | 203 | 21782 | 0 | 0 | 0 | 0 |
6 | 456 | 6185 | 1186 | 17092 | 70 | 1617 | 1856 | 0 | 0 | 0 | 5 | 63 | 0 | 0 | 0 | 0 |
7 | 445 | 4843 | 1075 | 13602 | 16 | 544 | 2113 | 0 | 0 | 0 | 2 | 49 | 0 | 0 | 0 | 0 |
pos (on chr11) | ref | read | dbSNP 1.3.7 |
---|---|---|---|
9163018 | c | A | - |
9168118 | a | G | rs17256375 |
9168422 | G | A | - |
9171535 | G | A | - |
9171610 | C | A | - |
9172722 | T | A | - |
9174128 | C | A | - |
9174312 | a | T | - |
9177091 | G | A | - |
9177843 | C | A | - |
9178051 | g | A | - |
9179148 | a | T | - |
9179368 | a | T | - |
(Found a bug in my snippet. fixed and updated on 30 Nov at 13:10.)
count.py
under src/
can do this task.