rivas-lab / nanopore

Tools for analysis of long-read sequencing data.
0 stars 0 forks source link

ONT Nanopore Human genetics data set #15

Open yk-tanigawa opened 7 years ago

yk-tanigawa commented 7 years ago

new data set

yk-tanigawa commented 7 years ago

data download

yk-tanigawa commented 7 years ago

mapping

extract fragment >= 10k && MAPQ >= 50

yk-tanigawa commented 7 years ago

count match mismatches

yk-tanigawa commented 7 years ago

count stats for first 100 fragments

name = (match) X (mismatch) I (insertion) D (deletion) N S (soft clip) H P
>=20 <20 >=20 <20 >=20 <20 >=20 <20 >=20 <20 >=20 <20 >=20 <20
48d3b21b-071b-4867-b760-9fa8217a05ce_Basecall_Alignment_template 41 13920 69 35391 0 759 3605 0 0 0 0 28 0 0 0 0
e6641b23-c4a5-4c3f-8c1f-0ffc8ba0289e_Basecall_Alignment_template 62 7345 115 19460 1 872 2433 0 0 0 0 48 0 0 0 0
c57c8744-1079-4b0a-b935-3b35df19d764_Basecall_Alignment_template 0 3058 0 8600 0 502 1603 0 0 0 0 18744 0 0 0 0
231e228e-f224-4a5b-8fc9-a8cbf8102f82_Basecall_Alignment_template 210 8891 371 23216 1 536 3122 0 0 0 0 44 0 0 0 0
150c6097-d476-4c17-9658-277ef05b967c_Basecall_Alignment_template 175 8541 383 24140 0 352 4205 0 0 0 0 48 0 0 0 0
8b8e6e52-706e-401f-bcbc-2ade0ec648d0_Basecall_Alignment_template 69 6098 212 17201 0 736 2500 0 0 0 104 12844 0 0 0 0
b7e15919-a64f-4d80-87a3-616fa87c768b_Basecall_Alignment_template 3 3263 12 9870 0 482 2359 0 0 0 21 12992 0 0 0 0
f31c8ee5-be79-4e50-95e7-8649e1ce850a_Basecall_Alignment_template 465 23960 2 1392 2 461 2420 0 0 0 0 15 0 0 0 0
f31c8ee5-be79-4e50-95e7-8649e1ce850a_Basecall_Alignment_template 465 23960 2 1392 2 461 2420 0 0 0 0 15 0 0 0 0
70327855-f95f-4d35-aa89-5f5b9f6c06b7_Basecall_Alignment_template 2 3749 10 11023 0 179 2973 0 0 0 15 11246 0 0 0 0
3874b25b-0775-4c37-9bc6-a090f3951673_Basecall_Alignment_template 2 2865 4 7704 0 295 799 0 0 0 6 23293 0 0 0 0
3874b25b-0775-4c37-9bc6-a090f3951673_Basecall_Alignment_template 2 2865 4 7704 0 295 799 0 0 0 6 23293 0 0 0 0
ec948202-698a-4953-8570-3491a0cf968e_Basecall_Alignment_template 14 7026 44 19267 0 720 4297 0 0 0 2 774 0 0 0 0
db5122de-9751-40d8-a92b-d2fb49dd8e3a_Basecall_Alignment_template 173 7002 411 19342 5 1009 2169 0 0 0 0 38 0 0 0 0
87003c57-625b-4f9f-82db-7ab9b9ed396f_Basecall_Alignment_template 137 7423 386 21095 5 1050 2222 0 0 0 13 659 0 0 0 0
3fe11325-0875-4c51-911a-1c2b451ba374_Basecall_Alignment_template 2 7403 5 21576 0 1147 3145 0 0 0 0 398 0 0 0 0
4c3da87d-4c72-4b91-82e6-291bbfdf932e_Basecall_Alignment_template 88 6571 189 17742 1 988 2908 0 0 0 0 33 0 0 0 0
44d9c067-6bb4-4d78-8f1c-a650a0791851_Basecall_Alignment_template 3 6794 13 18107 0 691 2276 0 0 0 0 37 0 0 0 0
90aa4b30-9779-4743-8bfd-7536079f4f5f_Basecall_Alignment_template 6 5264 8 14538 0 383 3217 0 0 0 5 7113 0 0 0 0
90aa4b30-9779-4743-8bfd-7536079f4f5f_Basecall_Alignment_template 6 5264 8 14538 0 383 3217 0 0 0 5 7113 0 0 0 0
dcbf2769-0faf-46e5-8521-2f77bfd13efc_Basecall_Alignment_template 72 3847 214 11485 0 384 1422 0 0 0 212 11087 0 0 0 0
fbe8d49d-e995-4a6b-a83c-bca2a93c1c9a_Basecall_Alignment_template 0 1260 0 3152 0 252 597 0 0 0 3 29987 0 0 0 0
87250133-d079-4a26-b2a7-3061e90acae3_Basecall_Alignment_template 6 7305 11 19927 0 312 4305 0 0 0 0 18 0 0 0 0
acb56443-39f1-45b1-9a89-ba52239509d9_Basecall_Alignment_template 115 6997 190 19014 0 341 3375 0 0 0 0 29 0 0 0 0
5cc14ea9-6d23-464e-b5f0-652f9c1350a4_Basecall_Alignment_template 103 6596 233 18413 1 341 2763 0 0 0 0 27 0 0 0 0
f049e7c6-2450-44c3-980f-106caee4489a_Basecall_Alignment_template 2 5432 5 16431 0 346 2900 0 0 0 0 4253 0 0 0 0
429d1996-8f92-4c69-aa05-80c7c00b62fa_Basecall_Alignment_template 390 7261 171 20864 0 314 3929 0 0 0 0 29 0 0 0 0
7d8cc58e-ebb0-4514-8200-c0f8a05f2e50_Basecall_Alignment_template 94 4365 242 13234 4 594 1731 0 0 0 101 12834 0 0 0 0
3f00ddf9-6d24-489a-947d-6d8d66e2ed15_Basecall_Alignment_template 42 3790 81 11005 0 299 1771 0 0 0 108 14473 0 0 0 0
3f00ddf9-6d24-489a-947d-6d8d66e2ed15_Basecall_Alignment_template 42 3790 81 11005 0 299 1771 0 0 0 108 14473 0 0 0 0
595b77b7-65fa-444c-b958-9ac4408950ac_Basecall_Alignment_template 201 6991 411 19144 0 329 2602 0 0 0 1 39 0 0 0 0
595b77b7-65fa-444c-b958-9ac4408950ac_Basecall_Alignment_template 201 6991 411 19144 0 329 2602 0 0 0 1 39 0 0 0 0
89fbf7ab-42d0-413a-970b-7f3ae57d1172_Basecall_Alignment_template 20 8512 35 22521 0 649 5260 0 0 0 0 36 0 0 0 0
f446696c-6a6a-4817-90e4-02993f85d623_Basecall_Alignment_template 140 8624 349 22755 3 587 3335 0 0 0 0 36 0 0 0 0
4e3f41d5-6702-44ba-b284-c4304665da77_Basecall_Alignment_template 96 9114 172 25120 1 848 3337 0 0 0 0 39 0 0 0 0
40d55215-5dd6-40c1-9eaa-1ab8ee28774d_Basecall_Alignment_template 8 4054 19 12265 0 232 2713 0 0 0 48 14810 0 0 0 0
40d55215-5dd6-40c1-9eaa-1ab8ee28774d_Basecall_Alignment_template 8 4054 19 12265 0 232 2713 0 0 0 48 14810 0 0 0 0
69e7a02e-db60-4761-b5d8-81ee5013963f_Basecall_Alignment_template 73 7247 136 20829 0 384 3614 0 0 0 0 39 0 0 0 0
456477be-4046-4eb2-83ec-e32c2362c6d7_Basecall_Alignment_template 127 6951 305 19253 0 288 2885 0 0 0 0 25 0 0 0 0
f625b61e-e94a-45c8-8735-16f74b6bb229_Basecall_Alignment_template 415 23374 2 1234 0 302 2522 0 0 0 0 27 0 0 0 0
7a211761-bff9-4d9a-b6cd-af2b6e9d6f5b_Basecall_Alignment_template 148 7495 81 21194 0 443 3713 0 0 0 0 64 0 0 0 0
de75aedf-1dce-44d1-b1de-229543fb5b48_Basecall_Alignment_template 82 5182 256 14018 0 189 1668 0 0 0 122 7238 0 0 0 0
ed82cd93-08ef-4839-be1b-412830f5811c_Basecall_Alignment_template 200 7520 482 20146 1 373 2206 0 0 0 2 35 0 0 0 0
d8ae80ed-c3e4-4e64-9b14-83e27f984b84_Basecall_Alignment_template 10 3204 16 7906 2 345 1667 0 0 0 0 14522 0 0 0 0
d8ae80ed-c3e4-4e64-9b14-83e27f984b84_Basecall_Alignment_template 10 3204 16 7906 2 345 1667 0 0 0 0 14522 0 0 0 0
eef9e8f4-f833-4f73-a85c-f0f6c14c8841_Basecall_Alignment_template 46 6587 102 18650 0 440 3264 0 0 0 0 37 0 0 0 0
4de5de89-b83c-44c2-906d-54f28003b78e_Basecall_Alignment_template 1 3204 8 9839 0 263 2124 0 0 0 3 17621 0 0 0 0
10610156-942b-4ad6-be11-f193bafb6857_Basecall_Alignment_template 13 7153 34 19657 1 468 1991 0 0 0 0 35 0 0 0 0
6b57884d-0d7a-49b3-a861-c751a1997257_Basecall_Alignment_template 249 9660 485 24581 0 356 3214 0 0 0 0 27 0 0 0 0
59a3e2b7-af40-49c6-933b-6870739fba0f_Basecall_Alignment_template 23 10873 58 29280 0 773 3355 0 0 0 0 64 0 0 0 0
ac9013e4-f69c-403c-a807-d02aad8ec1a8_Basecall_Alignment_template 26 7569 49 19552 0 319 2017 0 0 0 0 31 0 0 0 0
1b41d001-ac29-41c7-84d5-164062bdae59_Basecall_Alignment_template 233 6271 475 14866 8 700 1720 0 0 0 293 7952 0 0 0 0
1b41d001-ac29-41c7-84d5-164062bdae59_Basecall_Alignment_template 233 6271 475 14866 8 700 1720 0 0 0 293 7952 0 0 0 0
8b3dd2a0-eb9e-4ae7-8549-c16909ccda30_Basecall_Alignment_template 217 7202 405 19199 0 369 2483 0 0 0 0 28 0 0 0 0
9fcdeda2-32ec-41f0-8e53-5c8ec4a8a9ea_Basecall_Alignment_template 178 6875 366 17813 0 313 2349 0 0 0 0 29 0 0 0 0
9fcdeda2-32ec-41f0-8e53-5c8ec4a8a9ea_Basecall_Alignment_template 178 6875 366 17813 0 313 2349 0 0 0 0 29 0 0 0 0
14343724-5556-4a4b-a86f-2d7ada70b74f_Basecall_Alignment_template 1 6880 5 18298 0 1065 2888 0 0 0 0 54 0 0 0 0
52b1b143-42f0-4ffc-abe3-7eedc23bcbe3_Basecall_Alignment_template 229 6556 535 17038 0 198 2233 0 0 0 61 2586 0 0 0 0
b8c70656-be7b-4893-b1ca-edb14aac2c1b_Basecall_Alignment_template 144 9305 285 25028 1 560 3380 0 0 0 0 31 0 0 0 0
c26c4a94-0a55-4baa-855f-37c3a2732aa7_Basecall_Alignment_template 8 6209 21 16182 0 334 1754 0 0 0 0 12192 0 0 0 0
c54501c1-3afc-40f8-9f98-49f999234f14_Basecall_Alignment_template 190 7352 328 18804 2 674 3222 0 0 0 0 40 0 0 0 0
59d81c49-bfc5-4dbc-be8b-56ea492aa623_Basecall_Alignment_template 55 3938 86 11218 0 309 2146 0 0 0 31 17645 0 0 0 0
59d81c49-bfc5-4dbc-be8b-56ea492aa623_Basecall_Alignment_template 55 3938 86 11218 0 309 2146 0 0 0 31 17645 0 0 0 0
cec05e1d-47b6-4735-a1a4-f30386236339_Basecall_Alignment_template 146 3879 312 10344 0 192 1185 0 0 0 231 12666 0 0 0 0
dc219c94-933b-4639-981e-6d35a8d55105_Basecall_Alignment_template 226 7118 408 17268 0 237 2520 0 0 0 0 37 0 0 0 0
dc219c94-933b-4639-981e-6d35a8d55105_Basecall_Alignment_template 226 7118 408 17268 0 237 2520 0 0 0 0 37 0 0 0 0
29a7056b-14ed-493f-b6d3-eae197320ccf_Basecall_Alignment_template 50 3748 141 10588 3 358 1474 0 0 0 412 16335 0 0 0 0
bd65a876-21d4-44ef-9108-5fc36cc412b6_Basecall_Alignment_template 264 8550 608 22822 0 370 2711 0 0 0 0 48 0 0 0 0
a27702c3-6bfd-4820-a703-9600302988c9_Basecall_Alignment_template 99 7425 199 19326 0 564 3585 0 0 0 0 29 0 0 0 0
b33eb23c-6c25-49dd-8afb-f0e2dfec48c0_Basecall_Alignment_template 111 5072 316 14517 2 192 2077 0 0 0 332 19205 0 0 0 0
b33eb23c-6c25-49dd-8afb-f0e2dfec48c0_Basecall_Alignment_template 111 5072 316 14517 2 192 2077 0 0 0 332 19205 0 0 0 0
cf6f1837-9e78-4cd9-aebf-8d3351be07c0_Basecall_Alignment_template 160 9868 315 26173 3 743 4725 0 0 0 0 40 0 0 0 0
479cd91f-64b3-41a2-8940-a18f15609040_Basecall_Alignment_template 226 9213 477 25217 1 795 3212 0 0 0 0 40 0 0 0 0
a86ba11a-016b-462a-afed-954a6a863213_Basecall_Alignment_template 21 7218 22 19017 0 739 2322 0 0 0 0 25 0 0 0 0
9b651874-8afe-4b08-a86a-e9ce44531d89_Basecall_Alignment_template 740 26738 0 1441 10 617 2906 0 0 0 0 28 0 0 0 0
5da4db40-8876-4f9a-b30b-4d458bc02a64_Basecall_Alignment_template 89 5495 235 15376 2 335 1956 0 0 0 3 14035 0 0 0 0
5da4db40-8876-4f9a-b30b-4d458bc02a64_Basecall_Alignment_template 89 5495 235 15376 2 335 1956 0 0 0 3 14035 0 0 0 0
adde4c9e-3209-465a-be50-d8827aab2260_Basecall_Alignment_template 170 7077 389 19122 0 332 2518 0 0 0 0 51 0 0 0 0
11842ee6-8ae8-419e-b0a4-71eff3bd1248_Basecall_Alignment_template 208 7854 408 19599 4 637 2736 0 0 0 0 87 0 0 0 0
c379db88-4097-4bd1-b4ac-5b415768f39e_Basecall_Alignment_template 123 3952 226 10744 0 158 1332 0 0 0 328 10680 0 0 0 0
d47288d9-f045-46ed-be2b-368797d1cbb5_Basecall_Alignment_template 19 6486 31 17199 0 493 3763 0 0 0 20 13545 0 0 0 0
d47288d9-f045-46ed-be2b-368797d1cbb5_Basecall_Alignment_template 19 6486 31 17199 0 493 3763 0 0 0 20 13545 0 0 0 0
d47288d9-f045-46ed-be2b-368797d1cbb5_Basecall_Alignment_template 19 6486 31 17199 0 493 3763 0 0 0 20 13545 0 0 0 0
c1feb09c-8130-4f1f-9058-1a46f0fe8890_Basecall_Alignment_template 3 3369 4 8897 0 346 1463 0 0 0 1 18194 0 0 0 0
7d6dbc44-6538-4d37-b31b-5d5d69999aea_Basecall_Alignment_template 60 7240 104 17717 0 446 2993 0 0 0 0 36 0 0 0 0
bf7a4a40-cf6b-44c1-b2a4-e1ebbc375b80_Basecall_Alignment_template 11 6999 15 19162 1 440 2381 0 0 0 0 35 0 0 0 0
924eab24-3a50-40d0-8b6c-1546f3114876_Basecall_Alignment_template 91 6300 217 17270 0 457 2285 0 0 0 11 9778 0 0 0 0
60492a5a-d796-42f0-9334-3788152c4bc3_Basecall_Alignment_template 69 7742 115 20518 0 570 3686 0 0 0 0 36 0 0 0 0
3a9f9a2f-0199-4292-a959-919f63146a2c_Basecall_Alignment_template 91 4943 306 14789 2 335 2137 0 0 0 0 5787 0 0 0 0
e572150e-0430-44fa-831d-01a1efd54372_Basecall_Alignment_template 39 3880 124 11505 3 326 1613 0 0 0 0 12256 0 0 0 0
c6be0c05-2e72-41d6-814c-f04358d2b84b_Basecall_Alignment_template 428 6538 243 18387 0 278 2103 0 0 0 0 29 0 0 0 0
c707c357-24ac-40c2-80f4-66efb1992471_Basecall_Alignment_template 6 6952 15 18338 0 494 2522 0 0 0 0 10 0 0 0 0
45503036-1214-4b70-8e4d-3bf4fd5299d4_Basecall_Alignment_template 31 3998 68 11468 0 285 1678 0 0 0 3 12072 0 0 0 0
3302b3fb-4aac-4994-b17f-96ce12d9432b_Basecall_Alignment_template 158 6451 402 18617 1 454 1364 0 0 0 0 57 0 0 0 0
a0ce977b-f983-4365-8b32-e7a68c0c3bd7_Basecall_Alignment_template 180 8100 385 22623 0 444 2732 0 0 0 0 147 0 0 0 0
fbfe66f9-d6d8-4bca-b90a-e7a2e7b83d90_Basecall_Alignment_template 19 7693 47 22361 0 435 2635 0 0 0 13 6663 0 0 0 0
83103660-c8e6-452c-b554-8cda43a508c6_Basecall_Alignment_template 1 3019 2 8038 0 478 1568 0 0 0 11 22347 0 0 0 0
83103660-c8e6-452c-b554-8cda43a508c6_Basecall_Alignment_template 1 3019 2 8038 0 478 1568 0 0 0 11 22347 0 0 0 0
69606812-194c-4f6d-a713-802f18ef3b96_Basecall_Alignment_template 93 8937 237 23879 2 552 4940 0 0 0 0 36 0 0 0 0
9af53734-e7cd-468d-a997-7427f3931a34_Basecall_Alignment_template 213 9601 446 26579 0 529 4121 0 0 0 0 27 0 0 0 0
yk-tanigawa commented 7 years ago

step-wise plan as of 6 Dec 2016

  1. summarize how many reads you have with < 10% error rate >= 10kb reads
  2. Use those estimates to assess if you will have enough information to move forward with haplotype inference
  3. (nanopolish, If it is again only a single read then we need to apply clean up the reads)
  4. dbSNP summary
  5. pgenlib
yk-tanigawa commented 7 years ago

1. Filter by mismatch ratio <= 0.10 and length >= 10kb

2. Assessment

3. nanopolish

yk-tanigawa commented 7 years ago

filtered reads have few mismatches with high bass-call Q-values (>= 20)

screenshot 2016-12-12 16 26 36

yk-tanigawa commented 7 years ago

My program is capable of extracting SNPs info from bam file

screenshot 2016-12-12 16 34 13

yk-tanigawa commented 7 years ago
[ytanigaw@sh-5-36 ~]$ ~/projects/nanopore/scripts/20161212/report.sh
number of reads with mismatch rate <= 10% && length >= 10kb:
4446
number of filtered reads that contain at least one mismatch with Q-value >= 20:
1458
total number of mismatch with Q-value >= 20 in filtered reads:
3004
total number of mismatches whose positions are present in dbSNPs:
171
yk-tanigawa commented 7 years ago

distribution of mismatches

with base call quality filter (Q >= 20)

hist

without base call quality filter

hist

Codes are on this commit: https://github.com/rivas-lab/nanopore/tree/20161213_hist/scripts/20161213

yk-tanigawa commented 7 years ago

SNPs

yk-tanigawa commented 7 years ago

Testing with other threshold (base calling Q)

yk-tanigawa commented 7 years ago

number of SNPs on this data set

with base call quality filter (Q >= 14)

hist

with base call quality filter (Q >= 10)

hist

Codes are on this commit: https://github.com/rivas-lab/nanopore/blob/20161218three_plots/notes/20161218make_three_plots.ipynb

yk-tanigawa commented 7 years ago

nanopore-wgs 25000 sorted 10k mapq50 ext sorted q14 snps

yk-tanigawa commented 7 years ago

summary stats for release 3 data of chromosome 20

$ ls -hl /share/PI/mrivas/data/nanopore-wgs-consortium/poretools_fastq.12894489.geq12500.*
-rw-r--r-- 1 ytanigaw mrivas 5.7G Feb 19 17:19 /share/PI/mrivas/data/nanopore-wgs-consortium/poretools_fastq.12894489.geq12500.bam
-rw-r--r-- 1 ytanigaw mrivas 755K Feb 19 17:19 /share/PI/mrivas/data/nanopore-wgs-consortium/poretools_fastq.12894489.geq12500.bam.bai
-rw-r--r-- 1 ytanigaw mrivas 1.7G Feb 19 17:26 /share/PI/mrivas/data/nanopore-wgs-consortium/poretools_fastq.12894489.geq12500.fq.gz