suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 49 forks source link

draw fusion strand error #127

Closed been9312 closed 3 years ago

been9312 commented 3 years ago

Hi.

I used your draw_fusions.R script when you uploaded for font change.

`#!/bin/bash

while read READ1 READ2 do sample=basename ${READ1} _R1.fastq.gz STAR \ --runThreadN "6" \ --genomeDir "/data/asb/genome" --genomeLoad NoSharedMemory \ --readFilesIn "${READ1}" "${READ2}" --readFilesCommand zcat \ --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outSAMunmapped Within --outBAMcompression 0 \ --outFilterMultimapNmax 50 --peOverlapNbasesMin 10 --alignSplicedMateMapLminOverLmate 0.5 --alignSJstitchMismatchNmax 5 -1 5 5 --alignMatesGapMax 100000 \ --chimSegmentMin 10 --chimOutType WithinBAM HardClip --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 --chimSegmentReadGapMax 3 --chimMultimapNmax 50 |

tee Aligned.out.bam |

call arriba

"/data/asb/Arriba/arriba_v2.1.0/arriba" \ -x /dev/stdin \ -o ${sample}_fusions.tsv -O ${sample}_fusions.discarded.tsv \ -a /data/asb/Arriba/arriba_v2.1.0/hg38viral.fa -g /data/asb/Arriba/arriba_v2.1.0/hg38.refGene.gtf -b /data/asb/Arriba/arriba_v2.1.0/blacklist_hg38_GRCh38_v2.1.0.tsv -k /data/asb/Arriba/arriba_v2.1.0/known_fusions_hg38_GRCh38_v2.1.0.tsv -t /data/asb/Arriba/arriba_v2.1.0/known_fusions_hg38_GRCh38_v2.1.0.tsv -p /data/asb/Arriba/arriba_v2.1.0/protein_domains_hg38_GRCh38_v2.1.0.gff3 \

-d structural_variants_from_WGS.tsv

sorting and indexing is only required for visualization

if [[ $(samtools --version-only 2> /dev/null) =~ ^1. ]]; then samtools sort -@ "6" -m 4G -T tmp -O bam Aligned.out.bam > ${sample}.bam rm -f Aligned.out.bam samtools index ${sample}.bam else echo "samtools >= 1.0 required for sorting of alignments" fi

/data/asb/Arriba/arriba_v2.1.0/draw_fusions_v2.R \ --annotation=/data/asb/Arriba/arriba_v2.1.0/hg38.refGene.gtf \ --fusions=${sample}_fusions.tsv \ --output=${sample}.pdf \ --alignments=${sample}.bam \ --cytobands=/data/asb/Arriba/arriba_v2.1.0/cytobands_hg38_GRCh38_v2.1.0.tsv \ --proteinDomains=/data/asb/Arriba/arriba_v2.1.0/protein_domains_hg38_GRCh38_v2.1.0.gff3 \ --fontFamily=Times

done < sample.list`

I just edited parameter for font as you said. However, in the minus strand, an error occurs that is derived without changing the order of exon. For example, SS18 gene is - strand so the start site in chr18:26091217 should be labeled with exon 1, but exon 11 was shown.

Please check the scripts.

suhrig commented 3 years ago

Hm, I am having some difficulties explaining how this can be. The exons are not labelled by the draw_fusions.R script. They are taken verbatim from the GTF file.

been9312 commented 3 years ago

1. AF0140 it is result of older version SS18-KLF14 it is result of new version of R script

  1. I don't understand what is the line of the arriba output file

  2. The screenshot shows that

  3. I think the file was downloaded in download_reference.sh.

suhrig commented 3 years ago

Thanks for providing the details. It's clear now.

  1. I don't understand what is the line of the arriba output file

I mean the line in the output file of Arriba ${sample}_fusions.tsv. You can extract it with the command: grep SS18 ${sample}_fusions.tsv

In addition, could you please extract the relevant lines from the GTF file (grep 'NM_138693\|NM_001308201' hg38.refGene.gtf)? I want to double-check that the GTF file is okay.

been9312 commented 3 years ago

I checked the relevant lines from GTF file you said and that there was a problem with the gtf file. I will download hg38viral+RefSeq to download reference again. Thank you.