Extremely high percentage of reads too short to map...

GoogleCodeExporter commented 9 years ago

I have been using STAR with Illumina 101bp paired end reads. The first set of 
libraries I sequenced work great going through the pipeline, but I have had a 
very strange problem with the most recent libraries.

I call star using the following call:

Star_Directory/STAR --genomeDir Star_Directory/STAR_2.3.0/Genome --readFilesIn 
$f $f2 --outSAMstrandField intronMotif --runThreadN 3

where f and f2 are the paired end reads:
1-Nq-C96_S94_L001_R1_001_val_1.fq 
1-Nq-C96_S94_L001_R2_001_val_2.fq

which have been trimmed by trim_galore with the call:
trim_galore -q 15 --phred33 --paired --length 50 -a CTGTCTCTTATACACATCT 
--stringency 3 $f $f2

where f and f2 are the untrimmed fastq files:
1-Nq-C96_S94_L001_R2_001.fastq 
1-Nq-C96_S94_L001_R1_001.fastq 

For these runs the log.out file shows something like this:

                                  Started job on |  Sep 17 13:16:13
                             Started mapping on |   Sep 17 13:17:17
                                    Finished on |   Sep 17 13:17:47
       Mapping speed, Million of reads per hour |   21.76

                          Number of input reads |   181350
                      Average input read length |   179
                                    UNIQUE READS:
                   Uniquely mapped reads number |   1973
                        Uniquely mapped reads % |   1.09%
                          Average mapped length |   176.75
                       Number of splices: Total |   24
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   23
                       Number of splices: GC/AG |   1
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   0
                      Mismatch rate per base, % |   0.39%
                         Deletion rate per base |   0.04%
                        Deletion average length |   2.22
                        Insertion rate per base |   0.00%
                       Insertion average length |   1.50
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   948
             % of reads mapped to multiple loci |   0.52%
        Number of reads mapped to too many loci |   22
             % of reads mapped to too many loci |   0.01%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   98.37%
                     % of reads unmapped: other |   0.01%

However looking at the Fastq files it looks like the reads are for the most 
part adequate.
I've attached abreviated versions of the two of the paired end read fastqs.

I've also attached abbreviated versions of two of the paired end fastqs that 
have mapped with a unique mapping percentage of approximately 90% (called 
read1/2_goodMappers.fq)

I am new to RNAseq analysis, so this may be a trivial issue. I am hoping I can 
get any sort of help I can.

I am using STAR 2.3.0 on Mac OSX.

Thanks so much.

Original issue reported on code.google.com by bdul...@gmail.com on 18 Sep 2014 at 4:17

Attachments:

GoogleCodeExporter commented 9 years ago

It turns out my reads were just bad and they were not mapping to the genome...

Sorry for the trouble, back to making libraries!

Original comment by bdul...@gmail.com on 13 Oct 2014 at 12:34

marcrobertdemassy commented 8 years ago

Hi, I have exactly the same issue, can you be more specific about what you found out in the end? Thank you very much, Marc

tba123 / rna-star

Extremely high percentage of reads too short to map... #28