novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
110 stars 31 forks source link

Curcake SRR8767348; ignoring read without sequence #28

Closed akesarwani closed 4 years ago

akesarwani commented 4 years ago

For one the curlcake sample from Liu_et_Nat_Com_2019 (SRR8767348.fastq.gz), the feature extraction from fastq produced weird results. Unable to understand. Could some please help!

[M::mm_idx_gen::0.0380.26] collected minimizers [M::mm_idx_gen::0.0430.35] sorted minimizers [M::main::0.0430.35] loaded/built the index for 4 target sequence(s) [M::mm_mapopt_update::0.0450.34] mid_occ = 3 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 4 [M::mm_idx_stat::0.0450.35] distinct minimizers: 1864 (99.09% are singletons); average occurrences: 1.009; average spacing: 5.316 [M::worker_pipeline::83.7522.98] mapped 396018 sequences [M::worker_pipeline::165.269*1.99] mapped 348987 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont /projects/ke-lab/kesara/ONT/downloads/Liu_et_Nat_Com_2019/curlcake/reference/GSE124309_FASTA_sequences_of_Curlcakes.fa /projects/ke-lab/kesara/ONT/results/epinano/SRR8767348.U2T.fastq [M::main] Real time: 165.282 sec; CPU: 329.645 sec; Peak RSS: 2.254 GB [bam_sort_core] merging from 0 files and 4 in-memory blocks... [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.6490 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.8810 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.48744 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.94999 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.205713 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.261486 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.272770 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.311546 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.373700 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.389571 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.526071 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.532636 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.576609 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.667801 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.716665 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.52678 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.217676 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.716054 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.210763 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.513356 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.744910 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.10752 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.218171 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.463532 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.548731 [INFO][Sam2Tsv]Count: 5,728 Elapsed: 11 seconds(0.10%) Remains: 3 hours(99.90%) Last: cc6m_2244_t7_ecorv:10 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.269086 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.159458 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.116329 [INFO][Sam2Tsv]Count: 12,016 Elapsed: 22 seconds(1.20%) Remains: 30 minutes(98.80%) Last: cc6m_2244_t7_ecorv:120 [INFO][Sam2Tsv]Count: 19,229 Elapsed: 33 seconds(6.76%) Remains: 7 minutes(93.24%) Last: cc6m_2244_t7_ecorv:676 [INFO][Sam2Tsv]Count: 21,801 Elapsed: 44 seconds(8.54%) Remains: 7 minutes(91.46%) Last: cc6m_2244_t7_ecorv:854 [INFO][Sam2Tsv]Count: 28,585 Elapsed: 55 seconds(11.28%) Remains: 7 minutes(88.72%) Last: cc6m_2244_t7_ecorv:1,128 [INFO][Sam2Tsv]Count: 35,531 Elapsed: 1 minute(14.17%) Remains: 6 minutes(85.83%) Last: cc6m_2244_t7_ecorv:1,417 [INFO][Sam2Tsv]Count: 47,470 Elapsed: 1 minute(17.57%) Remains: 6 minutes(82.43%) Last: cc6m_2244_t7_ecorv:1,757 [INFO][Sam2Tsv]Count: 49,954 Elapsed: 1 minute(18.05%) Remains: 6 minutes(81.95%) Last: cc6m_2244_t7_ecorv:1,805 [INFO][Sam2Tsv]Count: 63,706 Elapsed: 1 minute(0.06%) Remains: 1 day(99.94%) Last: cc6m_2459_t7_ecorv:6 [INFO][Sam2Tsv]Count: 66,561 Elapsed: 1 minute(0.08%) Remains: 1 day(99.92%) Last: cc6m_2459_t7_ecorv:8 [INFO][Sam2Tsv]Count: 69,608 Elapsed: 2 minutes(0.09%) Remains: 1 day(99.91%) Last: cc6m_2459_t7_ecorv:9 [INFO][Sam2Tsv]Count: 74,088 Elapsed: 2 minutes(0.09%) Remains: 1 day(99.91%) Last: cc6m_2459_t7_ecorv:9 [INFO][Sam2Tsv]Count: 74,844 Elapsed: 2 minutes(0.09%) Remains: 1 day(99.91%) Last: cc6m_2459_t7_ecorv:9 [INFO][Sam2Tsv]Count: 75,057 Elapsed: 2 minutes(0.10%) Remains: 1 day(99.90%) Last: cc6m_2459_t7_ecorv:10 [INFO][Sam2Tsv]Count: 75,547 Elapsed: 2 minutes(0.10%) Remains: 1 day(99.90%) Last: cc6m_2459_t7_ecorv:10 [INFO][Sam2Tsv]Count: 75,935 Elapsed: 3 minutes(0.10%) Remains: 2 days(99.90%) Last: cc6m_2459_t7_ecorv:10 [INFO][Sam2Tsv]Count: 77,641 Elapsed: 3 minutes(0.10%) Remains: 2 days(99.90%) Last: cc6m_2459_t7_ecorv:10 [INFO][Sam2Tsv]Count: 82,926 Elapsed: 3 minutes(0.11%) Remains: 2 days(99.89%) Last: cc6m_2459_t7_ecorv:11 [INFO][Sam2Tsv]Count: 86,991 Elapsed: 3 minutes(0.14%) Remains: 1 day(99.86%) Last: cc6m_2459_t7_ecorv:14 [INFO][Sam2Tsv]Count: 87,087 Elapsed: 3 minutes(0.15%) Remains: 1 day(99.85%) Last: cc6m_2459_t7_ecorv:15 [INFO][Sam2Tsv]Count: 87,252 Elapsed: 3 minutes(0.15%) Remains: 1 day(99.85%) Last: cc6m_2459_t7_ecorv:15 [INFO][Sam2Tsv]Count: 87,500 Elapsed: 4 minutes(0.15%) Remains: 1 day(99.85%) Last: cc6m_2459_t7_ecorv:15 [INFO][Sam2Tsv]Count: 88,780 Elapsed: 4 minutes(0.15%) Remains: 2 days(99.85%) Last: cc6m_2459_t7_ecorv:15 [INFO][Sam2Tsv]Count: 94,268 Elapsed: 4 minutes(0.16%) Remains: 1 day(99.84%) Last: cc6m_2459_t7_ecorv:16 [INFO][Sam2Tsv]Count: 97,359 Elapsed: 4 minutes(0.16%) Remains: 2 days(99.84%) Last: cc6m_2459_t7_ecorv:16 [INFO][Sam2Tsv]Count: 102,909 Elapsed: 4 minutes(0.16%) Remains: 2 days(99.84%) Last: cc6m_2459_t7_ecorv:16 [INFO][Sam2Tsv]Count: 108,371 Elapsed: 5 minutes(0.16%) Remains: 2 days(99.84%) Last: cc6m_2459_t7_ecorv:16 [INFO][Sam2Tsv]Count: 111,018 Elapsed: 5 minutes(0.17%) Remains: 2 days(99.83%) Last: cc6m_2459_t7_ecorv:17 [INFO][Sam2Tsv]Count: 116,979 Elapsed: 5 minutes(0.17%) Remains: 2 days(99.83%) Last: cc6m_2459_t7_ecorv:17 [INFO][Sam2Tsv]Count: 122,575 Elapsed: 5 minutes(0.29%) Remains: 1 day(99.71%) Last: cc6m_2459_t7_ecorv:29 [INFO][Sam2Tsv]Count: 127,852 Elapsed: 5 minutes(0.48%) Remains: 20 hours(99.52%) Last: cc6m_2459_t7_ecorv:48 [INFO][Sam2Tsv]Count: 133,454 Elapsed: 5 minutes(1.72%) Remains: 5 hours(98.28%) Last: cc6m_2459_t7_ecorv:172 [INFO][Sam2Tsv]Count: 139,083 Elapsed: 6 minutes(2.98%) Remains: 3 hours(97.02%) Last: cc6m_2459_t7_ecorv:298 [INFO][Sam2Tsv]Count: 144,749 Elapsed: 6 minutes(4.21%) Remains: 2 hours(95.79%) Last: cc6m_2459_t7_ecorv:421 [INFO][Sam2Tsv]Count: 151,006 Elapsed: 6 minutes(5.95%) Remains: 1 hour(94.05%) Last: cc6m_2459_t7_ecorv:595 [INFO][Sam2Tsv]Count: 154,925 Elapsed: 6 minutes(6.84%) Remains: 1 hour(93.16%) Last: cc6m_2459_t7_ecorv:684 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.117494 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.286166 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.3889 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.369619 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.415095 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.439087 [INFO][Sam2Tsv]Count: 157,227 Elapsed: 6 minutes(7.30%) Remains: 1 hour(92.70%) Last: cc6m_2459_t7_ecorv:730 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.674372 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.87373 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.89719 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.98935 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.170353 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.224054 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.248600 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.280525 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.377275 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.519807 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.592519 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.622145 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.639466 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.691410 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.16925 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.139264 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.215124 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.291455 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.412762 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.317610 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.439003 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.678649 [INFO][Sam2Tsv]Count: 158,510 Elapsed: 7 minutes(7.65%) Remains: 1 hour(92.35%) Last: cc6m_2459_t7_ecorv:765 [WARN][Sam2Tsv]Ignoring read without sequence: SRR8767348.98271 [INFO][Sam2Tsv]Count: 159,277 Elapsed: 7 minutes(7.72%) Remains: 1 hour(92.28%) Last: cc6m_2459_t7_ecorv:772 [INFO][Sam2Tsv]Count: 163,214 Elapsed: 7 minutes(8.68%) Remains: 1 hour(91.32%) Last: cc6m_2459_t7_ecorv:868 [INFO][Sam2Tsv]Count: 169,601 Elapsed: 7 minutes(10.12%) Remains: 1 hour(89.88%) Last: cc6m_2459_t7_ecorv:1,012 [INFO][Sam2Tsv]Count: 177,563 Elapsed: 7 minutes(11.88%) Remains: 58 minutes(88.12%) Last: cc6m_2459_t7_ecorv:1,188 [INFO][Sam2Tsv]Count: 186,905 Elapsed: 8 minutes(13.70%) Remains: 51 minutes(86.30%) Last: cc6m_2459_t7_ecorv:1,370 [INFO][Sam2Tsv]Count: 196,915 Elapsed: 8 minutes(15.27%) Remains: 46 minutes(84.73%) Last: cc6m_2459_t7_ecorv:1,527 [INFO][Sam2Tsv]Count: 209,441 Elapsed: 8 minutes(17.26%) Remains: 40 minutes(82.74%) Last: cc6m_2459_t7_ecorv:1,726 [INFO][Sam2Tsv]Count: 226,472 Elapsed: 8 minutes(20.34%) Remains: 33 minutes(79.66%) Last: cc6m_2459_t7_ecorv:2,034 [INFO][Sam2Tsv]Count: 240,011 Elapsed: 8 minutes(22.47%) Remains: 30 minutes(77.53%) Last: cc6m_2595_t7_ecorv:5 [INFO][Sam2Tsv]Count: 245,567 Elapsed: 9 minutes(22.50%) Remains: 31 minutes(77.50%) Last: cc6m_2595_t7_ecorv:8 [INFO][Sam2Tsv]Count: 251,220 Elapsed: 9 minutes(22.52%) Remains: 31 minutes(77.48%) Last: cc6m_2595_t7_ecorv:10 [INFO][Sam2Tsv]Count: 254,828 Elapsed: 9 minutes(22.52%) Remains: 32 minutes(77.48%) Last: cc6m_2595_t7_ecorv:10 [INFO][Sam2Tsv]Count: 260,504 Elapsed: 9 minutes(22.52%) Remains: 32 minutes(77.48%) Last: cc6m_2595_t7_ecorv:10 [INFO][Sam2Tsv]Count: 265,000 Elapsed: 9 minutes(22.52%) Remains: 33 minutes(77.48%) Last: cc6m_2595_t7_ecorv:10 [INFO][Sam2Tsv]Count: 270,525 Elapsed: 9 minutes(22.53%) Remains: 34 minutes(77.47%) Last: cc6m_2595_t7_ecorv:11 [INFO][Sam2Tsv]Count: 275,950 Elapsed: 10 minutes(22.53%) Remains: 34 minutes(77.47%) Last: cc6m_2595_t7_ecorv:11 [INFO][Sam2Tsv]Count: 279,381 Elapsed: 10 minutes(22.53%) Remains: 35 minutes(77.47%) Last: cc6m_2595_t7_ecorv:11 [INFO][Sam2Tsv]Count: 285,038 Elapsed: 10 minutes(22.56%) Remains: 36 minutes(77.44%) Last: cc6m_2595_t7_ecorv:14 [INFO][Sam2Tsv]Count: 290,857 Elapsed: 10 minutes(22.78%) Remains: 36 minutes(77.22%) Last: cc6m_2595_t7_ecorv:36 [INFO][Sam2Tsv]Count: 295,898 Elapsed: 10 minutes(24.16%) Remains: 34 minutes(75.84%) Last: cc6m_2595_t7_ecorv:174 [INFO][Sam2Tsv]Count: 301,757 Elapsed: 11 minutes(25.74%) Remains: 31 minutes(74.26%) Last: cc6m_2595_t7_ecorv:332 [INFO][Sam2Tsv]Count: 307,762 Elapsed: 11 minutes(27.27%) Remains: 30 minutes(72.73%) Last: cc6m_2595_t7_ecorv:485 [INFO][Sam2Tsv]Count: 314,040 Elapsed: 11 minutes(28.77%) Remains: 28 minutes(71.23%) Last: cc6m_2595_t7_ecorv:635 [INFO][Sam2Tsv]Count: 320,957 Elapsed: 11 minutes(30.39%) Remains: 26 minutes(69.61%) Last: cc6m_2595_t7_ecorv:797 [INFO][Sam2Tsv]Count: 325,295 Elapsed: 11 minutes(31.31%) Remains: 26 minutes(68.69%) Last: cc6m_2595_t7_ecorv:889 [INFO][Sam2Tsv]Count: 333,474 Elapsed: 12 minutes(32.80%) Remains: 24 minutes(67.20%) Last: cc6m_2595_t7_ecorv:1,038 [INFO][Sam2Tsv]Count: 342,027 Elapsed: 12 minutes(34.40%) Remains: 23 minutes(65.60%) Last: cc6m_2595_t7_ecorv:1,198 [INFO][Sam2Tsv]Count: 350,753 Elapsed: 12 minutes(35.76%) Remains: 22 minutes(64.24%) Last: cc6m_2595_t7_ecorv:1,334 [INFO][Sam2Tsv]Count: 359,807 Elapsed: 12 minutes(37.43%) Remains: 21 minutes(62.57%) Last: cc6m_2595_t7_ecorv:1,501 [INFO][Sam2Tsv]Count: 369,442 Elapsed: 12 minutes(38.62%) Remains: 20 minutes(61.38%) Last: cc6m_2595_t7_ecorv:1,620 [INFO][Sam2Tsv]Count: 380,514 Elapsed: 12 minutes(40.22%) Remains: 19 minutes(59.78%) Last: cc6m_2595_t7_ecorv:1,780 [INFO][Sam2Tsv]Count: 395,411 Elapsed: 13 minutes(42.44%) Remains: 17 minutes(57.56%) Last: cc6m_2595_t7_ecorv:2,002 [INFO][Sam2Tsv]Count: 406,515 Elapsed: 13 minutes(43.68%) Remains: 17 minutes(56.32%) Last: cc6m_2595_t7_ecorv:2,126 [INFO][Sam2Tsv]Count: 427,821 Elapsed: 13 minutes(47.08%) Remains: 15 minutes(52.92%) Last: cc6m_2709_t7_ecorv:8 [INFO][Sam2Tsv]Count: 432,891 Elapsed: 13 minutes(47.10%) Remains: 15 minutes(52.90%) Last: cc6m_2709_t7_ecorv:10 [INFO][Sam2Tsv]Count: 437,761 Elapsed: 13 minutes(47.10%) Remains: 15 minutes(52.90%) Last: cc6m_2709_t7_ecorv:10 [INFO][Sam2Tsv]Count: 442,793 Elapsed: 14 minutes(47.11%) Remains: 15 minutes(52.89%) Last: cc6m_2709_t7_ecorv:11 [INFO][Sam2Tsv]Count: 447,936 Elapsed: 14 minutes(47.12%) Remains: 16 minutes(52.88%) Last: cc6m_2709_t7_ecorv:12 [INFO][Sam2Tsv]Count: 453,185 Elapsed: 14 minutes(47.14%) Remains: 16 minutes(52.86%) Last: cc6m_2709_t7_ecorv:14 [INFO][Sam2Tsv]Count: 458,257 Elapsed: 14 minutes(47.14%) Remains: 16 minutes(52.86%) Last: cc6m_2709_t7_ecorv:14 [INFO][Sam2Tsv]Count: 463,289 Elapsed: 14 minutes(47.18%) Remains: 16 minutes(52.82%) Last: cc6m_2709_t7_ecorv:18 [INFO][Sam2Tsv]Count: 468,680 Elapsed: 15 minutes(47.18%) Remains: 16 minutes(52.82%) Last: cc6m_2709_t7_ecorv:18 [INFO][Sam2Tsv]Count: 473,745 Elapsed: 15 minutes(47.18%) Remains: 17 minutes(52.82%) Last: cc6m_2709_t7_ecorv:18 [INFO][Sam2Tsv]Count: 478,580 Elapsed: 15 minutes(47.20%) Remains: 17 minutes(52.80%) Last: cc6m_2709_t7_ecorv:20 [INFO][Sam2Tsv]Count: 483,621 Elapsed: 15 minutes(47.26%) Remains: 17 minutes(52.74%) Last: cc6m_2709_t7_ecorv:26 [INFO][Sam2Tsv]Count: 488,529 Elapsed: 15 minutes(47.37%) Remains: 17 minutes(52.63%) Last: cc6m_2709_t7_ecorv:37 [INFO][Sam2Tsv]Count: 493,220 Elapsed: 16 minutes(47.97%) Remains: 17 minutes(52.03%) Last: cc6m_2709_t7_ecorv:97 [INFO][Sam2Tsv]Count: 498,707 Elapsed: 16 minutes(49.44%) Remains: 16 minutes(50.56%) Last: cc6m_2709_t7_ecorv:244 [INFO][Sam2Tsv]Count: 504,196 Elapsed: 16 minutes(50.93%) Remains: 15 minutes(49.07%) Last: cc6m_2709_t7_ecorv:392 [INFO][Sam2Tsv]Count: 509,455 Elapsed: 16 minutes(52.17%) Remains: 15 minutes(47.83%) Last: cc6m_2709_t7_ecorv:516 [INFO][Sam2Tsv]Count: 515,568 Elapsed: 16 minutes(53.67%) Remains: 14 minutes(46.33%) Last: cc6m_2709_t7_ecorv:666 [INFO][Sam2Tsv]Count: 519,924 Elapsed: 16 minutes(54.76%) Remains: 14 minutes(45.24%) Last: cc6m_2709_t7_ecorv:775 [INFO][Sam2Tsv]Count: 526,384 Elapsed: 17 minutes(56.48%) Remains: 13 minutes(43.52%) Last: cc6m_2709_t7_ecorv:947 [INFO][Sam2Tsv]Count: 533,537 Elapsed: 17 minutes(58.19%) Remains: 12 minutes(41.81%) Last: cc6m_2709_t7_ecorv:1,118 [INFO][Sam2Tsv]Count: 538,545 Elapsed: 17 minutes(59.36%) Remains: 11 minutes(40.64%) Last: cc6m_2709_t7_ecorv:1,235 [INFO][Sam2Tsv]Count: 546,542 Elapsed: 17 minutes(61.37%) Remains: 11 minutes(38.63%) Last: cc6m_2709_t7_ecorv:1,436 [INFO][Sam2Tsv]Count: 555,744 Elapsed: 17 minutes(63.39%) Remains: 10 minutes(36.61%) Last: cc6m_2709_t7_ecorv:1,638 [INFO][Sam2Tsv]Count: 560,945 Elapsed: 18 minutes(64.39%) Remains: 9 minutes(35.61%) Last: cc6m_2709_t7_ecorv:1,738 [INFO][Sam2Tsv]Count: 572,867 Elapsed: 18 minutes(66.60%) Remains: 9 minutes(33.40%) Last: cc6m_2709_t7_ecorv:1,959 [INFO][Sam2Tsv]Count: 586,593 Elapsed: 18 minutes(68.71%) Remains: 8 minutes(31.29%) Last: cc6m_2709_t7_ecorv:2,170 [INFO][Sam2Tsv]. Completed. N=745,740. That took:19 minutes

Huanle commented 4 years ago

@akesarwani Can you send me the command that generated these errors? Thanks.

akesarwani commented 4 years ago

Please see below the entire script. The same script worked for other 3 culrcakes but not for Open Curcake SRR8767348

1 trim the first and last few bad quality bases from raw fastq with NanoFilt (feel free to replace nanofilt with custome script)

gunzip -c ${fq} | NanoFilt -q 0 --headcrop 5 --tailcrop 3 --readtype 1D --logfile ${prefix}.nanofilt.log > ${prefix}.h5t3.fastq

NanoFilt -q 0 --headcrop 5 --tailcrop 3 --readtype 1D --logfile ${prefix}.nanofilt.log ${fq} > ${prefix}.h5t3.fastq # for uncompressed fastq

2 'U' to 'T' conversion

awk '{ if (NR%4 == 2) {gsub(/U/,"T",$1); print $1} else print }' ${prefix}.h5t3.fastq > ${prefix}.U2T.fastq

3 mapping to reference using minimap2

minimap2 -ax map-ont ${ref} ${prefix}.U2T.fastq | samtools view -bhS - | samtools sort -@ ${PBS_NP} -o ${prefix}.bam && samtools index ${prefix}.bam

4 calling variants for each single read-to-reference alignment

reads mapped to reverse strand of reference seqeucne will be flipped

java -jar /home/kesara/tools/jvarkit/dist/sam2tsv.jar -r ${ref} -o ${prefix}.bam.tsv ${prefix}.bam

module load python/2.7.10

5 convert results from step 4 and generate per_read variants information; the input file can be splitted based on read into smaller files to speed this step up.

python ${script}/per_read_var.py ${prefix}.bam.tsv > ${prefix}.per_read.var.csv

6 sumarize results from step 4 and generate variants information according the reference sequences (i.e., per_site variants); the input file can be splitted based on ref into smaller ones to speed this step up.

python ${script}/per_site_var.py ${prefix}.bam.tsv > ${prefix}.per_site.var.csv

7 slide per_site variants with window size of 5, so that fast5 event table information can be combined

python ${script}/slide_per_site_var.py ${prefix}.ref.per_site.var.csv > ${prefix}.per_site.var.sliding.win.csv

Huanle commented 4 years ago

hi @akesarwani , based on the warning messages generated by minimap2, the input fastq file seems to have empty read entries. May I ask you to double check if this is the case? If so, can you re-download the seqeucnes? thanks.

akesarwani commented 4 years ago

I noticed that the reads that got warning "WARN][Sam2Tsv]Ignoring read without sequence:" were mapped to two locations.

Huanle commented 4 years ago

Hi @akesarwani , is multi-mapping common in your case? If so, maybe you should proceed with uniquely mapped reads. Otherwise, if you really want to keep those multi-mapping reads, maybe you can keep only the primary alignments.