suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 50 forks source link

(total=ERROR: no normal reads found #248

Open YasirKusay opened 2 months ago

YasirKusay commented 2 months ago

I am using version 2.4.0 of Arriba and I stumbled across this error:

...
[2024-08-24T15:53:20] Reading chimeric alignments from '~/bams/sample1_trimmed.star.hg38.Chimeric.bam' (total=138010)
[2024-08-24T15:53:21] Reading chimeric alignments from '~/bams/sample1_trimmed.star.hg38.bam' (total=ERROR: no normal reads found

The second file that resulted in this error was supposed to be the alignment file (and not specifically the chimeric file). Here is the script I ran:

arriba \
       -c "~/bams/sample1_trimmed.star.hg38.Chimeric.bam" \
       -x "~/bams/sample1_trimmed.star.hg38.bam" \
       -g "${ref_path}/gencode.v39.annotation.gtf" \
       -a "${ref_path}/GRCh38_masked_v2_decoy_gene.fasta" \
       -b "${arriba_db_path}/blacklist_mm10_GRCm38_v2.4.0.tsv.gz" \
       -k "${arriba_db_path}/known_fusions_hg38_GRCh38_v2.4.0.tsv.gz" \
       -p "${arriba_db_path}/protein_domains_hg38_GRCh38_v2.4.0.gff3" \
       -o "${output}/fusions/sample1.tsv" \
       -O "${output}/fusions_discarded/sample1.tsv"

I noticed that this script was the one to throw the error: source/read_chimeric_alignments.cpp. If the file is empty, it will throw the error but even in this case the file was not empty.

suhrig commented 2 months ago

This usually indicates a usage mistake with STAR. Can you paste the contents of Log.final.out?

YasirKusay commented 2 months ago

Here are the contents:

                                 Started job on |   Aug 22 15:01:31
                             Started mapping on |   Aug 22 15:07:45
                                    Finished on |   Aug 22 15:12:28
       Mapping speed, Million of reads per hour |   36.23

                          Number of input reads |   2847817
                      Average input read length |   284
                                    UNIQUE READS:
                   Uniquely mapped reads number |   0
                        Uniquely mapped reads % |   0.00%
                          Average mapped length |   0.00
                       Number of splices: Total |   0
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   0
                       Number of splices: GC/AG |   0
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   0
                      Mismatch rate per base, % |   -nan%
                         Deletion rate per base |   0.00%
                        Deletion average length |   0.00
                        Insertion rate per base |   0.00%
                       Insertion average length |   0.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   0
             % of reads mapped to multiple loci |   0.00%
        Number of reads mapped to too many loci |   25964
             % of reads mapped to too many loci |   0.91%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   2738722
                 % of reads unmapped: too short |   96.17%
                Number of reads unmapped: other |   83131
                     % of reads unmapped: other |   2.92%
                                  CHIMERIC READS:
                       Number of chimeric reads |   138010
                            % of chimeric reads |   4.85%
suhrig commented 2 months ago

This is the problem: % of reads unmapped: too short | 96.17% STAR couldn't map any reads!

What command did you use?

Can you share a few reads from your FastQ file? For example:

zcat read1.fastq.gz | head -n 1000000 | tail
zcat read2.fastq.gz | head -n 1000000 | tail
YasirKusay commented 2 months ago

Its very weird, as everything here is 150 base pairs.

YasirKusay commented 2 months ago

I did not notice this and this may be a STAR issue and I will try and address it, but can you confirm that this issue is because of the low mapping rate.

suhrig commented 2 months ago

Yes, it's 100% because of the low mapping rate.