Closed been9312 closed 3 years ago
Hm, I am having some difficulties explaining how this can be. The exons are not labelled by the draw_fusions.R script. They are taken verbatim from the GTF file.
hg38.refGene.gtf
?1. it is result of older version it is result of new version of R script
I don't understand what is the line of the arriba output file
The screenshot shows that
I think the file was downloaded in download_reference.sh.
Thanks for providing the details. It's clear now.
- I don't understand what is the line of the arriba output file
I mean the line in the output file of Arriba ${sample}_fusions.tsv
. You can extract it with the command: grep SS18 ${sample}_fusions.tsv
In addition, could you please extract the relevant lines from the GTF file (grep 'NM_138693\|NM_001308201' hg38.refGene.gtf
)? I want to double-check that the GTF file is okay.
I checked the relevant lines from GTF file you said and that there was a problem with the gtf file. I will download hg38viral+RefSeq to download reference again. Thank you.
Hi.
I used your draw_fusions.R script when you uploaded for font change.
`#!/bin/bash
while read READ1 READ2 do sample=
basename ${READ1} _R1.fastq.gz
STAR \ --runThreadN "6" \ --genomeDir "/data/asb/genome" --genomeLoad NoSharedMemory \ --readFilesIn "${READ1}" "${READ2}" --readFilesCommand zcat \ --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outSAMunmapped Within --outBAMcompression 0 \ --outFilterMultimapNmax 50 --peOverlapNbasesMin 10 --alignSplicedMateMapLminOverLmate 0.5 --alignSJstitchMismatchNmax 5 -1 5 5 --alignMatesGapMax 100000 \ --chimSegmentMin 10 --chimOutType WithinBAM HardClip --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 --chimSegmentReadGapMax 3 --chimMultimapNmax 50 |tee Aligned.out.bam |
call arriba
"/data/asb/Arriba/arriba_v2.1.0/arriba" \ -x /dev/stdin \ -o ${sample}_fusions.tsv -O ${sample}_fusions.discarded.tsv \ -a /data/asb/Arriba/arriba_v2.1.0/hg38viral.fa -g /data/asb/Arriba/arriba_v2.1.0/hg38.refGene.gtf -b /data/asb/Arriba/arriba_v2.1.0/blacklist_hg38_GRCh38_v2.1.0.tsv -k /data/asb/Arriba/arriba_v2.1.0/known_fusions_hg38_GRCh38_v2.1.0.tsv -t /data/asb/Arriba/arriba_v2.1.0/known_fusions_hg38_GRCh38_v2.1.0.tsv -p /data/asb/Arriba/arriba_v2.1.0/protein_domains_hg38_GRCh38_v2.1.0.gff3 \
-d structural_variants_from_WGS.tsv
sorting and indexing is only required for visualization
if [[ $(samtools --version-only 2> /dev/null) =~ ^1. ]]; then samtools sort -@ "6" -m 4G -T tmp -O bam Aligned.out.bam > ${sample}.bam rm -f Aligned.out.bam samtools index ${sample}.bam else echo "samtools >= 1.0 required for sorting of alignments" fi
/data/asb/Arriba/arriba_v2.1.0/draw_fusions_v2.R \ --annotation=/data/asb/Arriba/arriba_v2.1.0/hg38.refGene.gtf \ --fusions=${sample}_fusions.tsv \ --output=${sample}.pdf \ --alignments=${sample}.bam \ --cytobands=/data/asb/Arriba/arriba_v2.1.0/cytobands_hg38_GRCh38_v2.1.0.tsv \ --proteinDomains=/data/asb/Arriba/arriba_v2.1.0/protein_domains_hg38_GRCh38_v2.1.0.gff3 \ --fontFamily=Times
done < sample.list`
I just edited parameter for font as you said. However, in the minus strand, an error occurs that is derived without changing the order of exon. For example, SS18 gene is - strand so the start site in chr18:26091217 should be labeled with exon 1, but exon 11 was shown.
Please check the scripts.