suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 50 forks source link

Finding fusions and counting supporting reads zsh: killed #228

Closed jdjdj0202 closed 8 months ago

jdjdj0202 commented 8 months ago

Hi I've been utilizing Arriba for RNA fusion analysis. A few months ago, I managed to successfully generate output files using RNA data on my MacBook's terminal by executing the same command lines. However, I'm now encountering errors as follows:

============================================================================================= (myenv) XXXX@DajeongcBookPro data % /Users/dajeong/arriba_v2.4.0/arriba -x /Users/dajeong/output/MCL-46T_STAR_Aligned.out.bam -g /Users/dajeong/Arriba_DJ/GENCODE38.gtf -a /Users/dajeong/Arriba_DJ/hg38.fa \ -b /Users/dajeong/arriba_v2.4.0/database/blacklist_hg38_GRCh38_v2.4.0.tsv.gz -k /Users/dajeong/arriba_v2.4.0/database/known_fusions_hg38_GRCh38_v2.4.0.tsv.gz -p /Users/dajeong/arriba_v2.4.0/database/protein_domains_hg38_GRCh38_v2.4.0.gff3 \ -o /Users/dajeong/Arriba_DJ/output/MCL-46T_fusions.tsv [2024-01-22T12:38:47] Launching Arriba 2.4.0 [2024-01-22T12:38:47] Loading assembly from '/Users/dajeong/Arriba_DJ/hg38.fa' [2024-01-22T12:39:04] Loading annotation from '/Users/dajeong/Arriba_DJ/GENCODE38.gtf' [2024-01-22T12:39:10] Reading chimeric alignments from '/Users/dajeong/STAR-Fusion/output/MCL-46T_STARAligned.out.bam' (total=71482657) [2024-01-22T12:44:39] Marking multi-mapping alignments (marked=170258) [2024-01-22T12:45:02] Detecting strandedness (no) [2024-01-22T12:45:06] Annotating alignments [2024-01-22T12:47:57] Filtering duplicates (remaining=67356747) [2024-01-22T12:48:38] Filtering mates which do not map to interesting contigs (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y AC NC_) (remaining=67075704) [2024-01-22T12:48:49] Filtering mates which only map to viral contigs (AC* NC*) (remaining=67075704) [2024-01-22T12:49:01] Filtering viral contigs with expression lower than the top 5 (remaining=67075704) [2024-01-22T12:49:24] Filtering viral contigs with less than 5% coverage (remaining=67075704) [2024-01-22T12:49:36] Estimating fragment length (mate gap mean=27441.1, mate gap stddev=28415.9, read length mean=30.042) [2024-01-22T12:49:47] Filtering read-through fragments with a distance <=10000bp (remaining=64229143) [2024-01-22T12:49:59] Filtering inconsistently clipped mates (remaining=64228814) [2024-01-22T12:50:10] Filtering breakpoints adjacent to homopolymers >=6nt (remaining=64221086) [2024-01-22T12:50:22] Filtering fragments with small insert size (remaining=64217214) [2024-01-22T12:50:33] Filtering alignments with long gaps (remaining=64217214) [2024-01-22T12:50:45] Filtering fragments with both mates in the same gene (remaining=64157768) [2024-01-22T12:50:57] Filtering fusions arising from hairpin structures (remaining=63958151) [2024-01-22T12:51:11] Filtering reads with a mismatch p-value <=0.01 (remaining=62270422) [2024-01-22T12:52:42] Filtering reads with low entropy (k-mer content >=60%) (remaining=62234043) [2024-01-22T12:55:13] Finding fusions and counting supporting reads zsh: killed /Users/dajeong/arriba_v2.4.0/arriba -x -g -a -b -k -p -o

I believe the available memory on my system has remained the same since then. Could you please help me resolve this issue? Thanks.

Sincerely, DJ. J

suhrig commented 8 months ago

Something doesn't look right about your input file. There are more than 67 million chimeric reads, which is 1-2 orders of magnitude bigger than normal. And the gap between the mates is huge: 27000 is 2-3 orders of magnitude bigger than normal. What input is this? Regular RNA-Seq?

jdjdj0202 commented 8 months ago

I accidentally put DNA sequencing data into the input file. Thanks to you, I was able to identify the cause and I appreciate it. Thanks a lot!

suhrig commented 8 months ago

Glad you sorted it out!