cannot generate more than 4 million short reads

GoogleCodeExporter commented 8 years ago

Hi,

I used the error model file provided by your sample folder and I always got the 
following 
exception error when I tried to generate more than 4 million short reads in the 
fastq file. But, the 
Simulator works fine when the total number of short reads in the fastq file is 
less than 4 million. 
The length of read is 50.

===================================================
Exception in thread "Sequencing Processor 1" 
java.lang.ArrayIndexOutOfBoundsException: 106
    at genome.sequencing.rnaseq.simulation.B.C.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.B.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)
Exception in thread "Sequencing Processor 1" java.lang.StackOverflowError
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)

Original issue reported on code.google.com by Guorong...@gmail.com on 21 May 2010 at 9:21

Attachments:

error_model_qualities.err

GoogleCodeExporter commented 8 years ago

(follow up to Issue 33: 
http://code.google.com/p/fluxcapacitor/issues/detail?id=33)

Original comment by gmicha@gmail.com on 26 May 2010 at 1:07

Added labels: Simulator

GoogleCodeExporter commented 8 years ago

Adopting the provided error file, I cannot reproduce the error neither for more 
than
for less than 4M reads with length 50nt. Also, I would not see any reason why 
the
read number should have an effect here.

Make sure that you run the newest program build and if the problem persists, 
please
send me the .PAR file. 

Issue remains open until further clarification.

Original comment by gmicha@gmail.com on 26 May 2010 at 1:41

Changed state: Unconfirmed

GoogleCodeExporter commented 8 years ago

I am using the latest version and found the similar problem.
When I generated 4000000 molecules and 6831065 paired-end reads with 75bp, I 
got the
following error message and simulator was stopped. But when the number of 
molecules
was 3000000 and 6 million single-end reads, simulator works fine. 

Problems reading sequence chr13: pos -1, len 2 into [0,61]
    null
java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at genome.model.I.A(Unknown Source)
    at genome.io.A.E.D(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)
Problems reading sequence chr13: pos -1, len 48 into [0,61]
    null
java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at genome.model.I.A(Unknown Source)
    at genome.io.A.E.D(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.A(Unknown Source)
    at genome.sequencing.rnaseq.simulation.A$_A.run(Unknown Source)
Exception in thread "Sequencing Processor 1" java.lang.StackOverflowError
    at genome.io.S.equals(Unknown Source)
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)
    at genome.model.I.A(Unknown Source)

Original comment by Guorong...@gmail.com on 5 Jun 2010 at 7:13

Attachments:

sample.par

GoogleCodeExporter commented 8 years ago

Hi Guorong,

maybe the error message resembles your former problem, however it is something 
quite
different: it seems that some of your reads fall outside of the chromosome 
boundaries
on chr13. Make sure that your genomic sequence is consistent with the 
annotation file
you adopted. If the problem persists, please tell me the mouse genome version 
and
send me a copy of the corresponding .PRO file.

Best, micha

Original comment by gmicha@gmail.com on 7 Jun 2010 at 7:36

GoogleCodeExporter commented 8 years ago

Hi Micha,

I used the mouse NCBI37/mm9 and downloaded it from the FTP server on the UCSC 
genome browser.
I removed the chr13 and then everything works fine.

Best,
Guorong

Original comment by Guorong...@gmail.com on 17 Jun 2010 at 3:16

GoogleCodeExporter commented 8 years ago

Hi Guorong,

please post your .PRO file that reproduces the error.

micha

Original comment by gmicha@gmail.com on 22 Jun 2010 at 1:46

GoogleCodeExporter commented 8 years ago

I got the same problem of "stack overflow" exception by using a simple gene 
example. Please see the attachment for all the data files that generate such 
exception, including a simplified chromosome index file. This exception occurs 
whenever a FASTQ is set to be true; if FASTQ=FALSE, everything works well. 
Also, this exception does not depend on the assignment of .error file.

BTW, one suggestion on further version of the software is, check the 
availability of all the files used in the software before running each step, 
instead of throwing a FileNotFound exception during the execution.

Original comment by li.david...@gmail.com on 5 Jul 2010 at 7:27

Attachments:

test.zip

GoogleCodeExporter commented 8 years ago

The error message is like this:
        sequencing ****Exception in thread "Sequencing Processor 1" java.lang.StackOverflowError
        at genome.model.I.A(Unknown Source)
        at genome.model.I.A(Unknown Source)
        at genome.model.I.A(Unknown Source)
        (and hundreds of the identical message as above)
The "stack overflow" exception is generally due to the incorrect call of 
recursion functions. You may check your function of genome.model.I.A() to see 
if there's something wrong with it.

Original comment by li.david...@gmail.com on 5 Jul 2010 at 7:55

GoogleCodeExporter commented 8 years ago

Hi Li,

thank you for that great report and the proper data attachment such that I can 
reproduce the problem. In your case, the problem is caused by a gene lying 
exactly at the chromosome start--very unlikely in biology, as we usually would 
expect a promoter upstream and several nts of teleomeric sequence before/after 
the first/last genes. However, it pointed out a border case issue, when 
transcription start variabilty causes a (similated) initiation before the 
beginning of an chromosome.

Thanks, 

micha

Original comment by gmicha@gmail.com on 5 Jul 2010 at 8:44

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Corresponding fix in 

http://code.google.com/p/fluxcapacitor/downloads/detail?name=FluxSimulator-20100
705.tar.gz

Original comment by gmicha@gmail.com on 5 Jul 2010 at 8:49

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Issue 43 has been merged into this issue.

Original comment by gmicha@gmail.com on 5 Jul 2010 at 8:54

GoogleCodeExporter commented 8 years ago

Hi Micha,

Many thanks for such a fast fix of the problem! The problem in comment 3) is 
caused by the same reason: in the UCSC gene annotation, two exons (uc007rzl.1 
and uc007rzm.1) end exactly at the boundary of chromosome 13 (length 120284312).

Original comment by li.david...@gmail.com on 5 Jul 2010 at 2:33

sivarajankumar / fluxcapacitor

cannot generate more than 4 million short reads #38