suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 50 forks source link

total=ERROR: could not find sequence of contig 'NC_007605' #213

Closed jdjdj0202 closed 1 year ago

jdjdj0202 commented 1 year ago

Hi, I want to analyze fusion from "aligned.sortedbycoord.out.bam file". I received file as this format, so this is a raw data to me.

I entered arriba_v2.4.0 directory and wrote following command lines:

./arriba -x /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam \ -g /home/ubuntu/arriba_v2.4.0/GENCODE38.gtf -a /home/ubuntu/arriba_v2.4.0/hg38.fa \ -b /home/ubuntu/arriba_v2.4.0/database/blacklist_hg38_GRCh38_v2.4.0.tsv -k /home/ubuntu/arriba_v2.4.0/database/known_fusions_hg38_GRCh38_v2.4.0.tsv \ -p /home/ubuntu/arriba_v2.4.0/database/protein_domains_hg38_GRCh38_v2.4.0.gff3 \ -o /home/ubuntu/ATL005_RNAfusions.tsv -O /home/ubuntu/ATL005_fusions.discarded.tsv

Error message was as follows: Reading chimeric alignments from '/home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam' (total=ERROR: could not find sequence of contig 'NC_007605'

/// I also tried ./arriba -x /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam \ -g /home/ubuntu/arriba_v2.4.0/GENCODE38.gtf -a /home/ubuntu/arriba_v2.4.0/hg19.fa \ -b /home/ubuntu/arriba_v2.4.0/database/blacklist_hg19_hs37d5_GRCh37_v2.4.0.tsv -k /home/ubuntu/arriba_v2.4.0/database/known_fusions_hg19_hs37d5_GRCh37_v2.4.0.tsv \ -p /home/ubuntu/arriba_v2.4.0/database/protein_domains_hg19_hs37d5_GRCh37_v2.4.0.gff3 \ -o /home/ubuntu/ATL005_RNAfusions.tsv -O /home/ubuntu/ATL005_fusions.discarded.tsv

The same error message appeared: [2023-08-30T07:11:42] Reading chimeric alignments from '/home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam' (total=ERROR: could not find sequence of contig 'NC_007605'

How can I solve this problem? Please help me. Thanks!!

jdjdj0202 commented 1 year ago

I solve this problem. ./samtools view -h /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam | grep -v "NC_007605" | samtools view -bS - > /home/ubuntu/RNA/ATL005.Contig_out.Aligned.sortedByCoord.out.bam

Thanks.

suhrig commented 1 year ago

When you run Arriba, you should use the same assembly (FastA file) that was used to generate the BAM file. This way the contigs are consistent and the error is avoided.