ucagenomix / sicelore-2.1

MIT License
14 stars 3 forks source link

No ValidSAMrecords when using step4 optionA #19

Open yuanwsy opened 8 months ago

yuanwsy commented 8 months ago

Hi,

Thanks for developing this very useful and much needed tool!

I am running step4 option A and it seems like it cannot identify any valid SAMrecords in my file.

The command I am running is java -jar -Xmx300g Sicelore-2.1.220323.jar IsoformMatrix I=step3output.bam GENETAG=GE UMITAG=U8 CELLTAG=BC REFFLAT=gencode.vM31.refFlat CSV=BarcodesAssigned.tsv DELTA=2 MAXCLIP=150 METHOD=STRICT AMBIGUOUS_ASSIGN=false OUTDIR=/output/ PREFIX=sicelore ISOBAM=TRUE

The output I get is [Sat Jan 27 17:33:55 CST 2024] Executing as yuanwsy@mgt on Linux 3.10.0-1160.76.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null INFO 2024-01-27 17:33:55 IsoformMatrix Cells detected [17538] INFO 2024-01-27 17:33:55 LongreadParser start... INFO 2024-01-27 17:34:14 LongreadParser Processed 1,000,000 Records. Elapsed time: 00:00:18s. Time for last 1,000,000: 18s. Last read position: 1:24,652,271 INFO 2024-01-27 17:34:27 LongreadParser Processed 2,000,000 Records. Elapsed time: 00:00:31s. Time for last 1,000,000: 13s. Last read position: 1:33,535,346 INFO 2024-01-27 17:34:43 LongreadParser Processed 3,000,000 Records. Elapsed time: 00:00:47s. Time for last 1,000,000: 16s. Last read position: 1:60,940,022 INFO 2024-01-27 17:35:00 LongreadParser Processed 4,000,000 Records. Elapsed time: 00:01:04s. Time for last 1,000,000: 16s. Last read position: 1:86,456,892 INFO 2024-01-27 17:35:16 LongreadParser Processed 5,000,000 Records. Elapsed time: 00:01:20s. Time for last 1,000,000: 16s. Last read position: 1:127,966,314 INFO 2024-01-27 17:35:33 LongreadParser Processed 6,000,000 Records. Elapsed time: 00:01:37s. Time for last 1,000,000: 16s. Last read position: 1:156,193,610 INFO 2024-01-27 17:35:50 LongreadParser Processed 7,000,000 Records. Elapsed time: 00:01:54s. Time for last 1,000,000: 17s. Last read position: 1:177,016,694 INFO 2024-01-27 17:36:06 LongreadParser Processed 8,000,000 Records. Elapsed time: 00:02:10s. Time for last 1,000,000: 15s. Last read position: 2:12,413,864 INFO 2024-01-27 17:36:18 LongreadParser Processed 9,000,000 Records. Elapsed time: 00:02:22s. Time for last 1,000,000: 12s. Last read position: 2:26,133,461 INFO 2024-01-27 17:36:35 LongreadParser Processed 10,000,000 Records. Elapsed time: 00:02:39s. Time for last 1,000,000: 16s. Last read position: 2:38,451,760 INFO 2024-01-27 17:36:51 LongreadParser Processed 11,000,000 Records. Elapsed time: 00:02:55s. Time for last 1,000,000: 16s. Last read position: 2:69,579,379 INFO 2024-01-27 17:37:08 LongreadParser Processed 12,000,000 Records. Elapsed time: 00:03:12s. Time for last 1,000,000: 16s. Last read position: 2:102,666,440 INFO 2024-01-27 17:37:24 LongreadParser Processed 13,000,000 Records. Elapsed time: 00:03:28s. Time for last 1,000,000: 15s. Last read position: 2:122,050,391 INFO 2024-01-27 17:37:40 LongreadParser Processed 14,000,000 Records. Elapsed time: 00:03:44s. Time for last 1,000,000: 15s. Last read position: 2:152,027,164 INFO 2024-01-27 17:37:56 LongreadParser Processed 15,000,000 Records. Elapsed time: 00:04:00s. Time for last 1,000,000: 16s. Last read position: 2:165,707,190 INFO 2024-01-27 17:38:10 LongreadParser Processed 16,000,000 Records. Elapsed time: 00:04:14s. Time for last 1,000,000: 14s. Last read position: 2:179,899,211 INFO 2024-01-27 17:38:27 LongreadParser Processed 17,000,000 Records. Elapsed time: 00:04:31s. Time for last 1,000,000: 16s. Last read position: 3:45,333,034 INFO 2024-01-27 17:38:44 LongreadParser Processed 18,000,000 Records. Elapsed time: 00:04:48s. Time for last 1,000,000: 16s. Last read position: 3:76,187,099 INFO 2024-01-27 17:39:01 LongreadParser Processed 19,000,000 Records. Elapsed time: 00:05:05s. Time for last 1,000,000: 17s. Last read position: 3:90,141,919 INFO 2024-01-27 17:39:17 LongreadParser Processed 20,000,000 Records. Elapsed time: 00:05:21s. Time for last 1,000,000: 16s. Last read position: 3:90,599,939 INFO 2024-01-27 17:39:34 LongreadParser Processed 21,000,000 Records. Elapsed time: 00:05:38s. Time for last 1,000,000: 17s. Last read position: 3:102,961,027 INFO 2024-01-27 17:39:51 LongreadParser Processed 22,000,000 Records. Elapsed time: 00:05:55s. Time for last 1,000,000: 16s. Last read position: 3:130,520,484 INFO 2024-01-27 17:40:07 LongreadParser Processed 23,000,000 Records. Elapsed time: 00:06:11s. Time for last 1,000,000: 16s. Last read position: 4:3,973,093 INFO 2024-01-27 17:40:23 LongreadParser Processed 24,000,000 Records. Elapsed time: 00:06:27s. Time for last 1,000,000: 15s. Last read position: 4:40,775,089 INFO 2024-01-27 17:40:40 LongreadParser Processed 25,000,000 Records. Elapsed time: 00:06:44s. Time for last 1,000,000: 17s. Last read position: 4:73,215,481 INFO 2024-01-27 17:40:56 LongreadParser Processed 26,000,000 Records. Elapsed time: 00:07:00s. Time for last 1,000,000: 16s. Last read position: 4:111,288,219 INFO 2024-01-27 17:41:13 LongreadParser Processed 27,000,000 Records. Elapsed time: 00:07:17s. Time for last 1,000,000: 17s. Last read position: 4:129,718,904 INFO 2024-01-27 17:41:29 LongreadParser Processed 28,000,000 Records. Elapsed time: 00:07:33s. Time for last 1,000,000: 15s. Last read position: 4:138,153,051 INFO 2024-01-27 17:41:43 LongreadParser Processed 29,000,000 Records. Elapsed time: 00:07:47s. Time for last 1,000,000: 13s. Last read position: 4:150,399,531 INFO 2024-01-27 17:41:59 LongreadParser Processed 30,000,000 Records. Elapsed time: 00:08:03s. Time for last 1,000,000: 16s. Last read position: 5:20,341,036 INFO 2024-01-27 17:42:15 LongreadParser Processed 31,000,000 Records. Elapsed time: 00:08:19s. Time for last 1,000,000: 16s. Last read position: 5:43,937,095 INFO 2024-01-27 17:42:33 LongreadParser Processed 32,000,000 Records. Elapsed time: 00:08:37s. Time for last 1,000,000: 17s. Last read position: 5:86,876,908 INFO 2024-01-27 17:42:50 LongreadParser Processed 33,000,000 Records. Elapsed time: 00:08:54s. Time for last 1,000,000: 17s. Last read position: 5:108,612,445 INFO 2024-01-27 17:43:05 LongreadParser Processed 34,000,000 Records. Elapsed time: 00:09:09s. Time for last 1,000,000: 15s. Last read position: 5:123,368,694 INFO 2024-01-27 17:43:21 LongreadParser Processed 35,000,000 Records. Elapsed time: 00:09:25s. Time for last 1,000,000: 15s. Last read position: 5:137,585,391 INFO 2024-01-27 17:43:39 LongreadParser Processed 36,000,000 Records. Elapsed time: 00:09:43s. Time for last 1,000,000: 17s. Last read position: 6:3,200,912 INFO 2024-01-27 17:43:55 LongreadParser Processed 37,000,000 Records. Elapsed time: 00:09:59s. Time for last 1,000,000: 15s. Last read position: 6:37,783,257 INFO 2024-01-27 17:44:10 LongreadParser Processed 38,000,000 Records. Elapsed time: 00:10:14s. Time for last 1,000,000: 14s. Last read position: 6:56,282,657 INFO 2024-01-27 17:44:25 LongreadParser Processed 39,000,000 Records. Elapsed time: 00:10:29s. Time for last 1,000,000: 15s. Last read position: 6:75,273,056 INFO 2024-01-27 17:44:41 LongreadParser Processed 40,000,000 Records. Elapsed time: 00:10:45s. Time for last 1,000,000: 16s. Last read position: 6:99,854,572 INFO 2024-01-27 17:44:58 LongreadParser Processed 41,000,000 Records. Elapsed time: 00:11:02s. Time for last 1,000,000: 17s. Last read position: 6:122,434,065 INFO 2024-01-27 17:45:14 LongreadParser Processed 42,000,000 Records. Elapsed time: 00:11:18s. Time for last 1,000,000: 15s. Last read position: 6:145,940,712 INFO 2024-01-27 17:45:30 LongreadParser Processed 43,000,000 Records. Elapsed time: 00:11:34s. Time for last 1,000,000: 15s. Last read position: 7:16,453,039 INFO 2024-01-27 17:45:46 LongreadParser Processed 44,000,000 Records. Elapsed time: 00:11:50s. Time for last 1,000,000: 16s. Last read position: 7:28,955,766 INFO 2024-01-27 17:46:03 LongreadParser Processed 45,000,000 Records. Elapsed time: 00:12:07s. Time for last 1,000,000: 16s. Last read position: 7:45,368,676 INFO 2024-01-27 17:46:18 LongreadParser Processed 46,000,000 Records. Elapsed time: 00:12:22s. Time for last 1,000,000: 15s. Last read position: 7:80,992,479 INFO 2024-01-27 17:46:35 LongreadParser Processed 47,000,000 Records. Elapsed time: 00:12:39s. Time for last 1,000,000: 17s. Last read position: 7:100,803,813 INFO 2024-01-27 17:46:53 LongreadParser Processed 48,000,000 Records. Elapsed time: 00:12:57s. Time for last 1,000,000: 18s. Last read position: 7:103,475,739 INFO 2024-01-27 17:47:12 LongreadParser Processed 49,000,000 Records. Elapsed time: 00:13:16s. Time for last 1,000,000: 19s. Last read position: 7:103,475,742 INFO 2024-01-27 17:47:31 LongreadParser Processed 50,000,000 Records. Elapsed time: 00:13:35s. Time for last 1,000,000: 18s. Last read position: 7:115,817,593 INFO 2024-01-27 17:47:47 LongreadParser Processed 51,000,000 Records. Elapsed time: 00:13:51s. Time for last 1,000,000: 15s. Last read position: 7:132,372,597 INFO 2024-01-27 17:48:02 LongreadParser Processed 52,000,000 Records. Elapsed time: 00:14:06s. Time for last 1,000,000: 15s. Last read position: 8:11,733,394 INFO 2024-01-27 17:48:17 LongreadParser Processed 53,000,000 Records. Elapsed time: 00:14:21s. Time for last 1,000,000: 14s. Last read position: 8:34,578,113 INFO 2024-01-27 17:48:33 LongreadParser Processed 54,000,000 Records. Elapsed time: 00:14:37s. Time for last 1,000,000: 16s. Last read position: 8:71,348,023 INFO 2024-01-27 17:48:50 LongreadParser Processed 55,000,000 Records. Elapsed time: 00:14:54s. Time for last 1,000,000: 16s. Last read position: 8:94,949,578 INFO 2024-01-27 17:49:06 LongreadParser Processed 56,000,000 Records. Elapsed time: 00:15:10s. Time for last 1,000,000: 16s. Last read position: 8:117,228,920 INFO 2024-01-27 17:49:22 LongreadParser Processed 57,000,000 Records. Elapsed time: 00:15:26s. Time for last 1,000,000: 15s. Last read position: 9:8,092,011 INFO 2024-01-27 17:49:36 LongreadParser Processed 58,000,000 Records. Elapsed time: 00:15:40s. Time for last 1,000,000: 14s. Last read position: 9:31,846,577 INFO 2024-01-27 17:49:53 LongreadParser Processed 59,000,000 Records. Elapsed time: 00:15:57s. Time for last 1,000,000: 16s. Last read position: 9:50,255,506 INFO 2024-01-27 17:50:10 LongreadParser Processed 60,000,000 Records. Elapsed time: 00:16:14s. Time for last 1,000,000: 17s. Last read position: 9:67,669,149 INFO 2024-01-27 17:50:27 LongreadParser Processed 61,000,000 Records. Elapsed time: 00:16:31s. Time for last 1,000,000: 16s. Last read position: 9:86,647,982 INFO 2024-01-27 17:50:44 LongreadParser Processed 62,000,000 Records. Elapsed time: 00:16:48s. Time for last 1,000,000: 16s. Last read position: 9:108,443,064 INFO 2024-01-27 17:51:02 LongreadParser Processed 63,000,000 Records. Elapsed time: 00:17:06s. Time for last 1,000,000: 18s. Last read position: 9:120,784,602 INFO 2024-01-27 17:51:18 LongreadParser Processed 64,000,000 Records. Elapsed time: 00:17:22s. Time for last 1,000,000: 16s. Last read position: 10:23,661,104 INFO 2024-01-27 17:51:33 LongreadParser Processed 65,000,000 Records. Elapsed time: 00:17:37s. Time for last 1,000,000: 14s. Last read position: 10:56,230,727 INFO 2024-01-27 17:51:50 LongreadParser Processed 66,000,000 Records. Elapsed time: 00:17:54s. Time for last 1,000,000: 16s. Last read position: 10:80,128,301 INFO 2024-01-27 17:52:07 LongreadParser Processed 67,000,000 Records. Elapsed time: 00:18:11s. Time for last 1,000,000: 17s. Last read position: 10:111,331,205 INFO 2024-01-27 17:52:24 LongreadParser Processed 68,000,000 Records. Elapsed time: 00:18:28s. Time for last 1,000,000: 17s. Last read position: 10:128,383,986 INFO 2024-01-27 17:52:39 LongreadParser Processed 69,000,000 Records. Elapsed time: 00:18:43s. Time for last 1,000,000: 15s. Last read position: 11:12,598,058 INFO 2024-01-27 17:52:56 LongreadParser Processed 70,000,000 Records. Elapsed time: 00:19:00s. Time for last 1,000,000: 16s. Last read position: 11:32,234,099 INFO 2024-01-27 17:53:11 LongreadParser Processed 71,000,000 Records. Elapsed time: 00:19:15s. Time for last 1,000,000: 15s. Last read position: 11:44,895,851 INFO 2024-01-27 17:53:28 LongreadParser Processed 72,000,000 Records. Elapsed time: 00:19:32s. Time for last 1,000,000: 16s. Last read position: 11:61,701,401 INFO 2024-01-27 17:53:46 LongreadParser Processed 73,000,000 Records. Elapsed time: 00:19:50s. Time for last 1,000,000: 17s. Last read position: 11:76,956,479 INFO 2024-01-27 17:54:02 LongreadParser Processed 74,000,000 Records. Elapsed time: 00:20:06s. Time for last 1,000,000: 16s. Last read position: 11:88,600,029 INFO 2024-01-27 17:54:19 LongreadParser Processed 75,000,000 Records. Elapsed time: 00:20:23s. Time for last 1,000,000: 16s. Last read position: 11:101,179,091 INFO 2024-01-27 17:54:37 LongreadParser Processed 76,000,000 Records. Elapsed time: 00:20:41s. Time for last 1,000,000: 18s. Last read position: 11:117,311,468 INFO 2024-01-27 17:54:53 LongreadParser Processed 77,000,000 Records. Elapsed time: 00:20:57s. Time for last 1,000,000: 15s. Last read position: 12:25,143,934 INFO 2024-01-27 17:55:09 LongreadParser Processed 78,000,000 Records. Elapsed time: 00:21:13s. Time for last 1,000,000: 16s. Last read position: 12:67,557,494 INFO 2024-01-27 17:55:26 LongreadParser Processed 79,000,000 Records. Elapsed time: 00:21:30s. Time for last 1,000,000: 16s. Last read position: 12:86,003,345 INFO 2024-01-27 17:55:41 LongreadParser Processed 80,000,000 Records. Elapsed time: 00:21:45s. Time for last 1,000,000: 14s. Last read position: 12:110,282,577 INFO 2024-01-27 17:55:58 LongreadParser Processed 81,000,000 Records. Elapsed time: 00:22:02s. Time for last 1,000,000: 16s. Last read position: 13:20,639,115 INFO 2024-01-27 17:56:13 LongreadParser Processed 82,000,000 Records. Elapsed time: 00:22:17s. Time for last 1,000,000: 14s. Last read position: 13:45,023,266 INFO 2024-01-27 17:56:29 LongreadParser Processed 83,000,000 Records. Elapsed time: 00:22:33s. Time for last 1,000,000: 16s. Last read position: 13:72,602,772 INFO 2024-01-27 17:56:43 LongreadParser Processed 84,000,000 Records. Elapsed time: 00:22:47s. Time for last 1,000,000: 14s. Last read position: 13:96,797,616 INFO 2024-01-27 17:56:58 LongreadParser Processed 85,000,000 Records. Elapsed time: 00:23:02s. Time for last 1,000,000: 14s. Last read position: 13:111,729,792 INFO 2024-01-27 17:57:13 LongreadParser Processed 86,000,000 Records. Elapsed time: 00:23:17s. Time for last 1,000,000: 15s. Last read position: 14:20,744,282 INFO 2024-01-27 17:57:30 LongreadParser Processed 87,000,000 Records. Elapsed time: 00:23:34s. Time for last 1,000,000: 17s. Last read position: 14:47,946,264 INFO 2024-01-27 17:57:46 LongreadParser Processed 88,000,000 Records. Elapsed time: 00:23:50s. Time for last 1,000,000: 15s. Last read position: 14:67,611,263 INFO 2024-01-27 17:58:02 LongreadParser Processed 89,000,000 Records. Elapsed time: 00:24:06s. Time for last 1,000,000: 15s. Last read position: 14:103,392,703 INFO 2024-01-27 17:58:17 LongreadParser Processed 90,000,000 Records. Elapsed time: 00:24:21s. Time for last 1,000,000: 15s. Last read position: 15:12,575,570 INFO 2024-01-27 17:58:33 LongreadParser Processed 91,000,000 Records. Elapsed time: 00:24:37s. Time for last 1,000,000: 16s. Last read position: 15:59,257,164 INFO 2024-01-27 17:58:51 LongreadParser Processed 92,000,000 Records. Elapsed time: 00:24:55s. Time for last 1,000,000: 17s. Last read position: 15:81,749,522 INFO 2024-01-27 17:59:06 LongreadParser Processed 93,000,000 Records. Elapsed time: 00:25:10s. Time for last 1,000,000: 15s. Last read position: 15:99,604,370 INFO 2024-01-27 17:59:23 LongreadParser Processed 94,000,000 Records. Elapsed time: 00:25:27s. Time for last 1,000,000: 17s. Last read position: 16:20,350,329 INFO 2024-01-27 17:59:39 LongreadParser Processed 95,000,000 Records. Elapsed time: 00:25:43s. Time for last 1,000,000: 16s. Last read position: 16:40,694,040 INFO 2024-01-27 17:59:56 LongreadParser Processed 96,000,000 Records. Elapsed time: 00:26:00s. Time for last 1,000,000: 16s. Last read position: 16:84,624,762 INFO 2024-01-27 18:00:12 LongreadParser Processed 97,000,000 Records. Elapsed time: 00:26:16s. Time for last 1,000,000: 16s. Last read position: 17:10,428,380 INFO 2024-01-27 18:00:29 LongreadParser Processed 98,000,000 Records. Elapsed time: 00:26:33s. Time for last 1,000,000: 16s. Last read position: 17:29,319,160 INFO 2024-01-27 18:00:44 LongreadParser Processed 99,000,000 Records. Elapsed time: 00:26:48s. Time for last 1,000,000: 15s. Last read position: 17:36,184,656 INFO 2024-01-27 18:01:02 LongreadParser Processed 100,000,000 Records. Elapsed time: 00:27:06s. Time for last 1,000,000: 17s. Last read position: 17:46,811,539 INFO 2024-01-27 18:01:18 LongreadParser Processed 101,000,000 Records. Elapsed time: 00:27:22s. Time for last 1,000,000: 16s. Last read position: 17:75,852,344 INFO 2024-01-27 18:01:35 LongreadParser Processed 102,000,000 Records. Elapsed time: 00:27:39s. Time for last 1,000,000: 16s. Last read position: 18:8,417,302 INFO 2024-01-27 18:01:50 LongreadParser Processed 103,000,000 Records. Elapsed time: 00:27:54s. Time for last 1,000,000: 15s. Last read position: 18:36,711,738 INFO 2024-01-27 18:02:08 LongreadParser Processed 104,000,000 Records. Elapsed time: 00:28:12s. Time for last 1,000,000: 17s. Last read position: 18:63,676,490 INFO 2024-01-27 18:02:26 LongreadParser Processed 105,000,000 Records. Elapsed time: 00:28:30s. Time for last 1,000,000: 17s. Last read position: 19:5,486,519 INFO 2024-01-27 18:02:43 LongreadParser Processed 106,000,000 Records. Elapsed time: 00:28:47s. Time for last 1,000,000: 17s. Last read position: 19:5,884,666 INFO 2024-01-27 18:03:00 LongreadParser Processed 107,000,000 Records. Elapsed time: 00:29:04s. Time for last 1,000,000: 16s. Last read position: 19:22,371,224 INFO 2024-01-27 18:03:15 LongreadParser Processed 108,000,000 Records. Elapsed time: 00:29:19s. Time for last 1,000,000: 15s. Last read position: 19:40,941,576 INFO 2024-01-27 18:03:31 LongreadParser Processed 109,000,000 Records. Elapsed time: 00:29:35s. Time for last 1,000,000: 16s. Last read position: X:8,008,602 INFO 2024-01-27 18:03:46 LongreadParser Processed 110,000,000 Records. Elapsed time: 00:29:50s. Time for last 1,000,000: 14s. Last read position: X:51,861,166 INFO 2024-01-27 18:04:00 LongreadParser Processed 111,000,000 Records. Elapsed time: 00:30:04s. Time for last 1,000,000: 13s. Last read position: X:79,501,653 INFO 2024-01-27 18:04:15 LongreadParser Processed 112,000,000 Records. Elapsed time: 00:30:19s. Time for last 1,000,000: 15s. Last read position: X:135,172,245 INFO 2024-01-27 18:04:30 LongreadParser Processed 113,000,000 Records. Elapsed time: 00:30:34s. Time for last 1,000,000: 15s. Last read position: X:165,990,089 INFO 2024-01-27 18:04:48 LongreadParser Processed 114,000,000 Records. Elapsed time: 00:30:52s. Time for last 1,000,000: 17s. Last read position: MT:2,751 INFO 2024-01-27 18:05:04 LongreadParser Processed 115,000,000 Records. Elapsed time: 00:31:08s. Time for last 1,000,000: 15s. Last read position: MT:7,765 INFO 2024-01-27 18:05:18 LongreadParser Processed 116,000,000 Records. Elapsed time: 00:31:22s. Time for last 1,000,000: 14s. Last read position: MT:14,145 INFO 2024-01-27 18:05:25 LongreadParser end... INFO 2024-01-27 18:05:25 LongreadParser Total SAMrecords 116447499 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords valid 0 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords unvalid 116447499 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords mapqv=0 0 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords no gene 107358445 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords no UMI 0 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords chimeria 9089054 INFO 2024-01-27 18:05:25 LongreadParser Total reads 0 INFO 2024-01-27 18:05:25 LongreadParser Total reads multiSAM 0 INFO 2024-01-27 18:05:25 MoleculeDataset MoleculeDataset init start... INFO 2024-01-27 18:05:25 MoleculeDataset Total molecules 0 INFO 2024-01-27 18:05:25 MoleculeDataset Total molecule reads 0 INFO 2024-01-27 18:05:25 MoleculeDataset Total molecule multiIG 0 INFO 2024-01-27 18:05:25 UCSCRefFlatParser UCSCRefFlatParser start... INFO 2024-01-27 18:05:26 UCSCRefFlatParser UCSCRefFlatParser end... INFO 2024-01-27 18:05:26 UCSCRefFlatParser Number of Genes Symbols [56775] INFO 2024-01-27 18:05:26 UCSCRefFlatParser Number of Transcripts [149423] INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms start... INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms end... INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms monoexon [0] INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms no match [0] INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms one match [0] INFO 2024-01-27 18:05:26 MoleculeDataset SetIsoforms ambiguous [0] INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix start...[56775] genes INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [10000/56775] genes processed INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [20000/56775] genes processed INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [30000/56775] genes processed INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [40000/56775] genes processed INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [50000/56775] genes processed INFO 2024-01-27 18:05:26 MoleculeDataset DTEMatrix [56775/56775] genes processed INFO 2024-01-27 18:05:26 IsoformMatrix writeIsoformMatrix [start] INFO 2024-01-27 18:05:26 IsoformMatrix writeGeneMatrix [start] INFO 2024-01-27 18:05:26 IsoformMatrix writeCellMetrics [start] INFO 2024-01-27 18:05:26 IsoformMatrix writeJunctionMatrix [start] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix cells size [17538] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix genes size [0] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix junctions size [0] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix isoforms size [0] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix isoforms counts [0] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix isoforms define [0] INFO 2024-01-27 18:05:26 IsoformMatrix Matrix isoforms undefine[0] INFO 2024-01-27 18:05:27 IsoformMatrix Producing ISOBAM [true] ......

Can you please let me know what might be causing this issue and how tofix this ?Thanks!

cobioda commented 8 months ago

Hi, it seems the pipeline does not found any SAM records having a GENETAG: INFO 2024-01-27 18:05:25 LongreadParser SAMrecords unvalid 116447499 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords no gene 107358445

Are you sure that you add a GENETAG SAM record tag (GE) into your bam file before counting ?

best, kevin

yuanwsy commented 8 months ago

yeah,in the step3 the command I used is java -jar -Xmx300g "NanoporeBC_UMI_finder-2.1.jar" assignumis --inFileNanopore "passed.bam" --outfile "step3out.bam" -a "/reference/gencode.vM31.annotation.gtf" also I have ever tried used refflat instead of gtf,I think I tried to add a GENETAG SAM record tag (GE) into my bam file, but one line of my bam looks like this , 32d85a79-2d0c-499c-9173-d0d330386df4_REV_PS=258_PE=289_AE=313_T=39_bc=AGTGTTGAGAGCACTG_ed=0_ed_sec=2147483647_bcStart=312_bcEnd=297_rk=0_X=AAAAAAAAAAAAAAGCAGAGGCTCCAGTGCTCTCAACACTAGA_Q=20.4_kjqey 16 1 3278713 60 100S220M40S *0 0^C TGTTATGCGTTCAGTTACGTATTGCTCTACACGACGCTCTTCCGATCTAGTGTTGAGAGCACTGGAGCCTCTGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTACTGATATACATTTACATAGTTTCCTCATACCTGTAGCTGTGTGCCAAGTACTTAGTGGCAGGGTTACAGGTAAGTTCTATAAAAGTTTATATTTCTTTACCATTTTAATGTAGAATTAAGTATATATTGAAGTAAGAAAAATGTCAGTCTTCGTGAGTCGAAATGATTTAGAAATAAAGCCTAGGAACCAGTTAAGCAGCTTAGAACCCACTGTCCCATGTACTCTGCGTTGATACCACTGCAGCAATACGTAG $$%&&&&$##$+05442/---.222576564311///01377300//03355100/.010444634333456666447137:<=>ABHGGFFF{{{{{{IGG@70///01221134466676211114574421111466654566888887665444445566677665556445411112////054443344433345784--...137:9876532222444443333344555222248::;987777787741111142----/5444456777676656766654434554322346799765434457:::6665566545558:70000/01112788877777644..+& B1:i:0 s1:i:198 B2:Z:2147483647 s2:i:0 U7:Z:GAGCCTCTGCTT U8:Z:GAGCCTCTGCTT BB:Z:312 BC:Z:AGTGTTGAGAGCACTG AE:i:313 BE:Z:297 PE:i:289 RE:Z: TE:i:39 BF:Z:297 XF:Z:INTERGENIC BH:Z:0 NM:i:2 AS:i:214 PS:i:258 BU:Z:AGTGTTGAGAGCACTG BV:Z:312 BW:i:0 BX:Z:N.A. SX:Z:34513018 BZ:Z:AGTGTTGAGAGCACTG UZ:Z: de:f:0.0091 rl:i:34 cm:i:30 nn:i:0 tp:A:P ms:i:214 ts:A:+ it seems no GE ,right?

Hi, it seems the pipeline does not found any SAM records having a GENETAG: INFO 2024-01-27 18:05:25 LongreadParser SAMrecords unvalid 116447499 INFO 2024-01-27 18:05:25 LongreadParser SAMrecords no gene 107358445

Are you sure that you add a GENETAG SAM record tag (GE) into your bam file before counting ?

best, kevin

cobioda commented 8 months ago

NanoporeBC_UMI_finder.jar should add the GE tags using standard config file. Did you take a look to your bam file in IGV, is there a lot of reads aligning to genes ?? You can check in IGV if there is the gene tag which is mandatory for counting.

yuanwsy commented 8 months ago

NanoporeBC_UMI_finder.jar should add the GE tags using standard config file. Did you take a look to your bam file in IGV, is there a lot of reads aligning to genes ?? You can check in IGV if there is the gene tag which is mandatory for counting.

I check in IGV,there actually is a lot of reads aligning to genes,so it is confused why NanoporeBC_UMI_finder.jar have not add the GE tags

cobioda commented 8 months ago

When using NanoporeBC_UMI_finder.jar, do the reads in the bam file have a GE tag ?? If not you should check the chromosome names 'chr' in the refflat file as describe in the README : "refFlat file for gene assignment. If supplied will generate a cell/gene umi count table and add the GE tag to SAM record. Uses the “TagReadWithGeneExonFunction” from the DropSeq package. Not required if the GE tag was added to BAM records with the Sicelore package." You can also add the GE tag using the sicelore package AddGeneNameTag pipeline. Is there a problem in the strandind of the reads (i.e. opposite strand issue?) All the different steps should very straightforward in fact. best,

yuanwsy commented 7 months ago

yeah,there is no GE tag in the bam file I used.

and I followed the tutorial like this to generate the refflat file : gtfToGenePred -genePredExt -geneNameAsName2 gencode.v38.primary_assembly.annotation.gtf gencode.v38.primary_assembly.annotation.refflat.txt paste <(cut -f 12 gencode.v38.primary_assembly.annotation.refflat.txt) <(cut -f 1-10 gencode.v38.primary_assembly.annotation.refflat.txt) > gencode.v38.refFlat

and the GTF file I use is gencode.vM34.primary_assembly.annotation.gtf the first line of the refflat I generated look like this: 4933401J01Rik ENSMUST00000193812.2 chr1 + 3143475 3144545 3144545 3144545 1 3143475, 3144545,

and I also try to add the GE tag using the sicelore package AddGeneNameTag pipeline: INFO 2024-02-15 16:41:39 AddGeneNameTag

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** AddGeneNameTag -I /data/R02/yuanwsy/sicelore-2.1/OEpassed2.bam -O OEpassed.GE.bam -REFFLAT /data/R02/yuanwsy/reference/gencode.vM34.refFlat -GENETAG GE


[Thu Feb 15 16:41:41 CST 2024] AddGeneNameTag INPUT=/data/R02/yuanwsy/sicelore-2.1/OEpassed2.bam OUTPUT=OEpassed.GE.bam REFFLAT=/data/R02/yuanwsy/reference/gencode.vM34.refFlat GENETAG=GE STRANDTAG=GS FUNCTIONTAG=XF USE_STRAND_INFO=true ALLOW_MULTI_GENE_READS=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Thu Feb 15 16:41:41 CST 2024] Executing as yuanwsy@mgt on Linux 3.10.0-1160.76.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null {BUFFER_SIZE=131072, COMPRESSION_LEVEL=5, CREATE_INDEX=false, CREATE_MD5=false, CUSTOM_READER_FACTORY=https://www.googleapis.com/genomics,com.google.cloud.genomics.gatk.htsjdk.GA4GHReaderFactory, DISABLE_SNAPPY_COMPRESSOR=false, EBI_REFERENCE_SERVICE_URL_MASK=https://www.ebi.ac.uk/ena/cram/md5/%s, NON_ZERO_BUFFER_SIZE=131072, REFERENCE_FASTA=null, SAM_FLAG_FIELD_FORMAT=DECIMAL, USE_ASYNC_IO_READ_FOR_SAMTOOLS=false, USE_ASYNC_IO_WRITE_FOR_SAMTOOLS=false, USE_ASYNC_IO_WRITE_FOR_TRIBBLE=false, USE_CRAM_REF_DOWNLOAD=false} INFO 2024-02-15 16:41:42 AddGeneNameTag Loaded 54 transcripts. INFO 2024-02-15 16:42:49 AddGeneNameTag Processed 1,000,000 Records. Elapsed time: 00:01:09s. Time for last 1,000,000: 66s. Last read position: 1:24,652,268 INFO 2024-02-15 16:43:41 AddGeneNameTag Processed 2,000,000 Records. Elapsed time: 00:02:02s. Time for last 1,000,000: 52s. Last read position: 1:24,653,961 INFO 2024-02-15 16:44:38 AddGeneNameTag Processed 3,000,000 Records. Elapsed time: 00:02:58s. Time for last 1,000,000: 56s. Last read position: 1:52,497,007 INFO 2024-02-15 16:45:40 AddGeneNameTag Processed 4,000,000 Records. Elapsed time: 00:04:00s. Time for last 1,000,000: 61s. Last read position: 1:66,987,669 INFO 2024-02-15 16:46:34 AddGeneNameTag Processed 5,000,000 Records. Elapsed time: 00:04:55s. Time for last 1,000,000: 54s. Last read position: 1:87,008,004 INFO 2024-02-15 16:47:43 AddGeneNameTag Processed 6,000,000 Records. Elapsed time: 00:06:03s. Time for last 1,000,000: 68s. Last read position: 1:128,198,125 INFO 2024-02-15 16:48:49 AddGeneNameTag Processed 7,000,000 Records. Elapsed time: 00:07:10s. Time for last 1,000,000: 66s. Last read position: 1:153,784,205 ...... INFO 2024-02-15 19:04:41 AddGeneNameTag TOTAL READS [136661160] CORRECT_STRAND [136658628] WRONG_STRAND [2532] AMBIGUOUS_STRAND_FIXED [115] AMBIGUOUS REJECTED READS [0] [Thu Feb 15 19:04:41 CST 2024] org.ipmc.sicelore.programs.AddGeneNameTag done. Elapsed time: 143.01 minutes.

after that ,I checked the taged bam file ,and still not found GE tag: d5a1855c-9535-4963-acbc-119dbdb4ee01_REV_PS=326_PE=359_AE=381_T=41_bc=TGAGTCAGTCGAACAG_ed=0_ed_sec=2147483647_bcStart=380_bcEnd=365_rk=0_X=AAAAAAAAAAAAGAATAAAGCTCACTGTTCGACTGACTCAAGA_Q=18.2_6ck1d 0 1 3050531 2 42S176M1D65M3D15M137S 0 0 CTATGTATTGCTAAGCAGTGGTATCAACGCAGAGTACATGGGCAACATTATGAACTAGCCAGTACCCTGGAGCTCTTGACTCTAGCTGCATATGTATCAAAAGATGGCCTAGCTGGCCATCACTGGAAAGAGAGGCCCATTGGACATGCAAACTTTATATGCCCCAGTACAGGGGAATGCCAGGGCCAAAAAGTGGGAGTGGGTGGGTAGGGGAGTGTGGGGAGGGTATGGGGGACTTTTGGGATAGCACTGGAAATGTAAATGAGGGAAATACCTAATAAAATATAAAAAATAAAAATAAATAAAATGAATCACAACATATGCCAAAAAAAAAAAAAAAAAAAAAAAAAAAGAATAAAGCTCACTGTTCGACTGACTCAAGATCGGAAGAGCGTCGTGTAGAGCAATACGTAACTGAACGAAGTACATACATAA %'++334444456776640////.-...264211012347433321001122011224411111456655443322233444455420000/....0464455443222357455223225667654444443321123244445666553431///0134454////06884221135696753344434475523233555+.--.9<84423368;8884/---.6995444567522000113344573333343544343665432111124579422249=>;999<@><<;8999;;665211012332112012249;@BEFHHGDCCCCABA@=;976465521211211122221-,,--2332223444/-,,+-.6.,,++-..0330001031000012345.---,,))%$$$$%%$$$# s1:i:49 s2:i:0 XF:Z:INTERGENIC NM:i:24 AS:i:188 de:f:0.0853 rl:i:255 cm:i:5 nn:i:0 tp:A:P ms:i:189 ts:A:+

what's wrong with this? thanks a lot!

cobioda commented 7 months ago

i can see "XF:Z:INTERGENIC" in your samrecords so it seems to work (this is a flag integrated by AddGeneNameTag). There is no GE tag because the samrecord isn't in a gene region, please look into a gene region, you should have a GE tag. you can do 'samtools view OEpassed.GE.bam | grep GE"

yuanwsy commented 7 months ago

I have tried grep GE and grep GE: but still no GE tag the first command I grep lots of intergenic region ,the second I grep alot like this: a4ae63f4-263b-46da-9ffd-23b796eb748f_FWD_PS=512_PE=531_AE=557_X=AAAAAAAAAAAAACACCTACTCGCCTGGGCAACCATAGTCAGA_Q=22.7_svi5 16 137448683 60 62S183M2I294M50S * 0 0 CTACGTATTGCTCTACACTGACGCTCTTCCGATCTGACTATGGTTGCCCAGGCGAGTAGGTGTTTTTTTTTTTTTTTTTTGGCTCCTTTTAGAAACAGGTAACAGCTTGGATCTGGGACATTTGAGGCTTAAGCAGGACCAGTCTTGGCAAGAGTCAGGGAGGGTGCAGGCATCCCTCTCCATAAGTGCAGACAGCCTCCTGTCTCCCCTGCCTGCTGGGAGGAAGATGTGCTCTGCTAAGGGGTGGGGTGTGGCTCACTGCCCCACCCTCTAGGCAGGGCTGTGGAAGGTGAGAGCCAGGAAGCTCTCTTCCCCTAACCCTCCTCCCAGGCCTGCTCTGCTGGTATCAGATTGGCCCGAAGCCCCAGGCCTGATCAAAGATGGCTGAGTCTCAGTGTGGCTGGTTGAGCCTTTTAACTCTTGGTTGGTTCATTTACTCTTAGTCTTTTGTTTTTTGTTTGTTTGGTTTTGCTCATTTTGACATCACTGCCTTTTAGAAATATTTCTTCAGGTTTTAGAATTAAATGTTTCCCATGTACTCTGCGTTGATACCACTGCTTAGCAATACGTAACTGAACGAGTAAACAGTAA '())//(('''&)((()))+,0-++'&&&&++()''&&&&&001000018?BCCHHH{{{{GE:;565432223334532222321112343330/...1..110111123420/./025843222444333222225444334/6651100---.---.122221..--.0266333335533332223344333447777843432330000025344422)))/6))))5334433100114575434333111139787544444343243324355666343321011124020////.24447888966665755566763----.12111460....232445766440////)),113345443..-----12244553111114322210//0123333344433344211111---,-01364458:9989:322224224;;654335589433223233223234433434221212222323333444332234454221110/01001114-,+++/100011244322123443.,+++,0224/..((('&$$%$%%$### s1:i:439 s2:i:0 XF:Z:INTERGENIC NM:i:5 AS:i:464 de:f:0.0084 rl:i:30 cm:i:77 nn:i:0 tp:A:P ms:i:464 ts:A:+

the GE above is GE:;565432223334532222321112343330/...1..11011112342

found no GE tag

cobioda commented 7 months ago

might be the difference in chromosome name format between refflat and you ref. used for alignment, i can see there is the 'chr' in chromosome name in the refflat but not in the bam file. It need to be coherent between both.

yuanwsy commented 7 months ago

but I used the GRCm39.genome.fa.gz as the genome reference for alignment and gencode.vM34.primary_assembly.annotation.gtf as the annotation reference , it is coherent