sivarajankumar / fluxcapacitor

Automatically exported from code.google.com/p/fluxcapacitor
0 stars 0 forks source link

ArrayIndexOutOfBoundsException with PositionErrorProfile at the last position of the read #33

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
For which program(s) you want a new feature?
(Capacitor/Simulator/Both/Tetris)
Simulator

Which build of the program(s)?
(look it up in the .prop file in the ./bin folder of the installation)
20100427

What operating system you use?
(unix32, unix64, win32, win64, macPPC32, macX8632, macX8664, other;
see /proc/cpuinfo, Settings->Control Panel->System, Apple->About this Mac)
unix32, unix64

What steps will reproduce the problem?
1. Add a PositionErrorProfile at the last position in the read

Example (with the demo files):
#MODEL 75 10000000
#CROSSTALK A
 0.25  0.25 0.25 0 0.25
#CROSSTALK G
 0.25 0.25 0.25 0 0.25
#CROSSTALK C
 0.25 0.25 0.25 0 0.25
#CROSSTALK T
 0.25 0.25 0.25 0 0.25
# PositionErrorProfile 75 1 0.1

What is the expected output? What do you see instead?

FluxSimulator hangs on Segregating cDNA and produces this output:
       sequencing Exception in thread "Sequencing Processor 1"
java.lang.ArrayIndexOutOfBoundsException: 139
        at genome.sequencing.rnaseq.simulation.B.A.A(Unknown Source)
        at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
        at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
        at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
        at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)

Original issue reported on code.google.com by mikael.s...@gmail.com on 5 May 2010 at 12:54

GoogleCodeExporter commented 8 years ago
Hi, 

I got the same problem on my Mac machine as the following,
Exception in thread "Sequencing Processor 1" 
java.lang.ArrayIndexOutOfBoundsException: 110
    at genome.sequencing.rnaseq.simulation.B.C.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.B.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)

Original comment by Guorong...@gmail.com on 14 May 2010 at 5:42

GoogleCodeExporter commented 8 years ago
Hi,

sorry for the late reaction to this issue. The solution is that indices in the 
error
file are 0-based, I corrected the format description in the wiki at
http://fluxcapacitor.wikidot.com/formats:err

Adjusting the sample error file Mikael provided to
# PositionErrorProfile 74 1 0.1

results in an output with the expected 10% error frequency of the last 
nucleotides
(small letters represent changes of the nucleotides in the genomic sequence, 
here
introduced by simulated sequencing errors).

$ cat test.fasta | awk '$0!~/^>/{print substr($0,length($0),1)}' | sort | uniq 
-c

 288102 A
 252795 C
 252642 G
     27 N
 286593 T
  30198 a
  29998 c
  29714 g
  29965 t

It remains discussable, whether 0-based or 1-based coordinates are here more
intuitive here and I would welcome arguments for the one or the other. 
Therefore, the
issue will stay open with the attribute "done" for further comments. For now, I 
added
a sanity check of positional error indices which in future builds will reject 
error
models with position indices falling outside of the read length. Thanks for 
bringing
this to attention.

Original comment by gmicha@gmail.com on 14 May 2010 at 6:27

GoogleCodeExporter commented 8 years ago
Hi,

I used the error model file provided by your sample folder and I always got the 
following exception error 
when I tried to generate more than 4 million short reads in the fastq file. 
But, the Simulator works fine when 
the total number of short reads in the fastq file is less than 4 million. The 
length of read is 50.

===================================================
Exception in thread "Sequencing Processor 1" 
java.lang.ArrayIndexOutOfBoundsException: 106
    at genome.sequencing.rnaseq.simulation.B.C.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.B.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)
Exception in thread "Sequencing Processor 1" java.lang.StackOverflowError
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)

Original comment by Guorong...@gmail.com on 21 May 2010 at 9:19

Attachments:

GoogleCodeExporter commented 8 years ago
The last report seems to be a duplicated of Issue 38:

http://code.google.com/p/fluxcapacitor/issues/detail?id=38

Remaining the Issue here closed, continuing there.

Original comment by gmicha@gmail.com on 26 May 2010 at 1:05