ucagenomix / sicelore-2.1

MIT License
13 stars 2 forks source link

No valid SAMrecords in FusionDetector #5

Closed areebapatel closed 1 year ago

areebapatel commented 1 year ago

Hi,

Thanks for developing this very useful and much needed tool!

I am running FusionDetector and it seems like it cannot identify any valid SAMrecords in my file.

The command I am running is java -jar -Xmx72g Jar/Sicelore-2.1.jar FusionDetector I=${sicelore_dir}/passed/fusions/${sample}_clipped_reads.tagbamwithread.US.bam O=${sicelore_dir}/passed/fusions/ PREFIX=fusion CSV=${sicelore_dir}/ValidBarcodes.csv

The output I get is

Thu Feb 23 14:07:02 CET 2023] Executing as a390l@odcf-cn33u24s01 on Linux 3.10.0-1160.76.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 14+36-1461; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null INFO 2023-02-23 14:07:03 FusionDetector Cells detected [872103] INFO 2023-02-23 14:07:04 LongreadParser start... INFO 2023-02-23 14:07:26 LongreadParser Processed 1,000,000 Records. Elapsed time: 00:00:21s. Time for last 1,000,000: 21s. Last read position: chr1:52,481,498 INFO 2023-02-23 14:07:44 LongreadParser Processed 2,000,000 Records. Elapsed time: 00:00:39s. Time for last 1,000,000: 18s. Last read position: chr1:148,128,006 INFO 2023-02-23 14:08:04 LongreadParser Processed 3,000,000 Records. Elapsed time: 00:00:59s. Time for last 1,000,000: 19s. Last read position: chr1:212,051,462 INFO 2023-02-23 14:08:24 LongreadParser Processed 4,000,000 Records. Elapsed time: 00:01:19s. Time for last 1,000,000: 20s. Last read position: chr10:96,750,826 INFO 2023-02-23 14:08:47 LongreadParser Processed 5,000,000 Records. Elapsed time: 00:01:42s. Time for last 1,000,000: 22s. Last read position: chr11:65,499,045 INFO 2023-02-23 14:09:11 LongreadParser Processed 6,000,000 Records. Elapsed time: 00:02:06s. Time for last 1,000,000: 24s. Last read position: chr11:85,990,651 INFO 2023-02-23 14:09:32 LongreadParser Processed 7,000,000 Records. Elapsed time: 00:02:27s. Time for last 1,000,000: 21s. Last read position: chr12:53,386,080 INFO 2023-02-23 14:09:52 LongreadParser Processed 8,000,000 Records. Elapsed time: 00:02:47s. Time for last 1,000,000: 19s. Last read position: chr13:22,695,871 INFO 2023-02-23 14:10:12 LongreadParser Processed 9,000,000 Records. Elapsed time: 00:03:07s. Time for last 1,000,000: 20s. Last read position: chr14:68,421,478 INFO 2023-02-23 14:10:32 LongreadParser Processed 10,000,000 Records. Elapsed time: 00:03:27s. Time for last 1,000,000: 20s. Last read position: chr15:71,164,313 INFO 2023-02-23 14:10:51 LongreadParser Processed 11,000,000 Records. Elapsed time: 00:03:46s. Time for last 1,000,000: 19s. Last read position: chr16:28,842,788 INFO 2023-02-23 14:11:12 LongreadParser Processed 12,000,000 Records. Elapsed time: 00:04:07s. Time for last 1,000,000: 20s. Last read position: chr17:17,383,327 INFO 2023-02-23 14:11:30 LongreadParser Processed 13,000,000 Records. Elapsed time: 00:04:25s. Time for last 1,000,000: 18s. Last read position: chr17:62,016,122 INFO 2023-02-23 14:11:52 LongreadParser Processed 14,000,000 Records. Elapsed time: 00:04:47s. Time for last 1,000,000: 21s. Last read position: chr19:5,690,296 INFO 2023-02-23 14:12:13 LongreadParser Processed 15,000,000 Records. Elapsed time: 00:05:08s. Time for last 1,000,000: 20s. Last read position: chr19:46,608,521 INFO 2023-02-23 14:12:34 LongreadParser Processed 16,000,000 Records. Elapsed time: 00:05:30s. Time for last 1,000,000: 21s. Last read position: chr2:69,770,286 INFO 2023-02-23 14:12:55 LongreadParser Processed 17,000,000 Records. Elapsed time: 00:05:50s. Time for last 1,000,000: 20s. Last read position: chr2:177,219,081 INFO 2023-02-23 14:13:15 LongreadParser Processed 18,000,000 Records. Elapsed time: 00:06:10s. Time for last 1,000,000: 20s. Last read position: chr20:31,541,088 INFO 2023-02-23 14:13:34 LongreadParser Processed 19,000,000 Records. Elapsed time: 00:06:29s. Time for last 1,000,000: 18s. Last read position: chr21:8,437,245 INFO 2023-02-23 14:13:53 LongreadParser Processed 20,000,000 Records. Elapsed time: 00:06:48s. Time for last 1,000,000: 19s. Last read position: chr22:45,855,782 INFO 2023-02-23 14:14:12 LongreadParser Processed 21,000,000 Records. Elapsed time: 00:07:07s. Time for last 1,000,000: 18s. Last read position: chr3:56,621,921 INFO 2023-02-23 14:14:34 LongreadParser Processed 22,000,000 Records. Elapsed time: 00:07:29s. Time for last 1,000,000: 22s. Last read position: chr3:186,789,321 INFO 2023-02-23 14:14:54 LongreadParser Processed 23,000,000 Records. Elapsed time: 00:07:49s. Time for last 1,000,000: 19s. Last read position: chr4:108,621,949 INFO 2023-02-23 14:15:13 LongreadParser Processed 24,000,000 Records. Elapsed time: 00:08:08s. Time for last 1,000,000: 19s. Last read position: chr5:64,378,743 INFO 2023-02-23 14:15:34 LongreadParser Processed 25,000,000 Records. Elapsed time: 00:08:29s. Time for last 1,000,000: 20s. Last read position: chr5:151,662,440 INFO 2023-02-23 14:15:52 LongreadParser Processed 26,000,000 Records. Elapsed time: 00:08:47s. Time for last 1,000,000: 18s. Last read position: chr6:65,304,141 INFO 2023-02-23 14:16:14 LongreadParser Processed 27,000,000 Records. Elapsed time: 00:09:09s. Time for last 1,000,000: 21s. Last read position: chr7:15,701,353 INFO 2023-02-23 14:16:33 LongreadParser Processed 28,000,000 Records. Elapsed time: 00:09:29s. Time for last 1,000,000: 19s. Last read position: chr7:77,736,408 INFO 2023-02-23 14:16:54 LongreadParser Processed 29,000,000 Records. Elapsed time: 00:09:50s. Time for last 1,000,000: 21s. Last read position: chr7:141,134,916 INFO 2023-02-23 14:17:15 LongreadParser Processed 30,000,000 Records. Elapsed time: 00:10:11s. Time for last 1,000,000: 21s. Last read position: chr8:118,070,180 INFO 2023-02-23 14:17:34 LongreadParser Processed 31,000,000 Records. Elapsed time: 00:10:29s. Time for last 1,000,000: 18s. Last read position: chr9:94,293,664 INFO 2023-02-23 14:17:53 LongreadParser Processed 32,000,000 Records. Elapsed time: 00:10:48s. Time for last 1,000,000: 18s. Last read position: chr12_GL877875v1_alt:158,635 INFO 2023-02-23 14:18:09 LongreadParser Processed 33,000,000 Records. Elapsed time: 00:11:04s. Time for last 1,000,000: 15s. Last read position: chr12_KI270904v1_alt:23,647 INFO 2023-02-23 14:18:24 LongreadParser Processed 34,000,000 Records. Elapsed time: 00:11:19s. Time for last 1,000,000: 15s. Last read position: chr6_GL000255v2_alt:3,667,808 INFO 2023-02-23 14:18:43 LongreadParser Processed 35,000,000 Records. Elapsed time: 00:11:38s. Time for last 1,000,000: 18s. Last read position: chrUn_GL000220v1:153,187 INFO 2023-02-23 14:19:02 LongreadParser Processed 36,000,000 Records. Elapsed time: 00:11:57s. Time for last 1,000,000: 19s. Last read position: chrX:114,050,215 INFO 2023-02-23 14:19:08 LongreadParser end... INFO 2023-02-23 14:19:08 LongreadParser Total SAMrecords 36341118 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords valid 0 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords unvalid 36341118 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords mapqv=0 0 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords no gene 0 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords no UMI 0 INFO 2023-02-23 14:19:08 LongreadParser SAMrecords chimeria 0 INFO 2023-02-23 14:19:08 LongreadParser Total reads 0 INFO 2023-02-23 14:19:08 LongreadParser Total reads multiSAM 0 INFO 2023-02-23 14:19:08 MoleculeDataset MoleculeDataset init start... INFO 2023-02-23 14:19:08 MoleculeDataset Total molecules 0 INFO 2023-02-23 14:19:08 MoleculeDataset Total molecule reads 0 INFO 2023-02-23 14:19:08 MoleculeDataset Total molecule multiIG 0 INFO 2023-02-23 14:19:08 FusionDetector SetFusions start... [Thu Feb 23 14:19:09 CET 2023] org.ipmc.sicelore.programs.FusionDetector done. Elapsed time: 12.10 minutes. Runtime.totalMemory()=2868903936

The first line of my BAM file looks like-

e80a74fc-6591-435f-9fe7-ad24c506109b_null_CATTCATAGTGTTCCA_AGCCCATACTCC 256 chr1 10522 0 112S120M1D29M3I20M1I1M564N2M2D5M1D5M2I29M1D41M3D25M3I2M1I33M7D28M1D1M2D125M212S 0 0 s1:i:270 NM:i:53 AS:i:296 QS:Z:&()36:=?BCAA@>====?;:8/,,))(&&''''>@BAAA????@BCA==>@?@@@?@?AAAABBBBBAA??==9352,),001141*('''(-;;==@?@CA>=<:6/++++,@@?60...0=A@9.-,-366++++;;<A@BCBB@?=:9/5556?ABA>=?>==>=======>>>==00>>=?==-+('''(,,,,-@A@@@AAAAA5*+55556:9::;>@?><:5)(((++421/+''''%)3,)>>?;;::9>===?>7211320233)('''(+--...,++.9:;==>>:30,-1378<,,,,,0.(-.05666??>?=5;=8656=<;;;<<<=>>>=<;;.-,+)((333;;;>?>?AA>=??<:52023/056));;<9;11*)))))))),-9:<<<==;44211112<=>6;<=<=@??;::99983,*(()(()((&*<;:766779;88=:3)&%%%)01:==544448;==>==<==?@??????@>??>>>==@ABD==;:;==?>===711:+))))3339CA>@A=;;3333=@ADABDCA@<6210/0-1457;;;;;;;:987765541-':<=====<;:999999;;<==>>?>?@@AA@=A25569??@=/..-+*.01199;==64589?<<<<>>=;;;99/****)*+5////0:999:;<>==>>>>=?@5.A@?654449A@@=?452--ADA==5421/00))(('''&%&''(-/2'&%$#$$%%%%&'%&,,,0>==7(&&&& US:Z:CCAGACAAGAAAGTTGTCGGTGTCTTTGTGTTGCCCTGTTGGTGCTGATATTGCAAGCAGTGGTATCAACGCAGAGTACATGGGGAGCATCGCGAGGTGGAGCTGCGTTCTCCTCCGCCTTCGCAGTACCACTGAAATCTGTGCAGAGGACAACGCAGGTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCTGTCGCAAAGGCACCCTGCGCCGCGCCGGCGCAGAGAGGCGCGCCGCGCTGGGCGCGCCGGCGCAGAGAGGCGCAGGGCTCTTCTAAGCCAGTGGTCGCCAGCGCCCCCTGCTGGTGCCGGGCACTGCAGGGCCCTCTTGCTTACTGTATAGTGGTGGCACGCCTGCTGGCAGCTAGGGACATTGGTATATAGGTCCTCTTGCTCAAGGTGTAGTGGCAGCACGGTTGGCAGCTGGGGACACTGCCGGGCCCCGTTCAAACAGTAGTGGCGGATTATGTGGAAACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTCCTAAATATGGGATTCCTGGGTTTAAAAGTATAAAATAAATATGTTTAATTTGTTAACTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGAGTATGGGCTTGGAACACTATGAATGAGATCGGAAGAGCGTCGTGTAGGAAGACAGAGCGACAGGCAAGTCACAAAGACACCGACAACTTTCTTGTCACGGTAGGCGATGGACTGGTTAAACACTCATTTTCCACAATTGAACGAAGTA de:f:0.0793 rl:i:64 cm:i:37 nn:i:0 tp:A:S ms:i:337 ts:A:+

I suspected it was an issue with the US tag not being added correctly with AddBamReadSequenceTag, but I re-did the tagging with /NanoporeBC_UMI_finder-2.1.jar tagbamwithread to get the US tags in place. But I still run into this error.

Can you please let me know what might be causing this issue?

ucagenomix commented 1 year ago

Hi,

FusionDetector pipeline has not very maintained in the last version sicelore, there might be some issue in the README. what i can see is that the bam file is lacking CellBC and UMI tags required for FusionDetector to work correctly so all SAM records are considered as invalid and the pipeline is not running properly. cellBC and UMI tags are added using AddBamReadTags and it seems not doing the job here. Initially this pipeline was dedicated to be run using the consensus molecules fastq file. Could you give a try to rerun AddBamReadTags using the Sicelore-2.1.220323.jar in github Jar directory and check the cellBC and UMI tags are present before running FusionDetector. Please also check the .csv file use (FusionDetector Cells detected [872103], do you really have 872k cells in your dataset?)

best,

areebapatel commented 1 year ago

It works now, thank you so much!