sivarajankumar / fluxcapacitor

Automatically exported from code.google.com/p/fluxcapacitor
0 stars 0 forks source link

Sequencing fails, gtf files with 2-3 genes #43

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

Hi Micha,
I'm trying to run Flux Simulator but something has to be wrong in my .par file. 
I've run the program with the same parameters but with a different number of 
genes in the gtf, 1300 and 2 genes. If I use 1300 genes in the .gtf file then 
the pipeline works but it never gives a .fastq file, just the .fasta file. If I 
use 2-3 genes in the .gtf file the software fails but it does not report any 
warning it stops working at sequencing.

READ_LENGTH 75
READ_NUMBER 50000
initing
524884 lines submitted
zipping **
524884 lines zipped
sequencing ****

Here is my .par file

REF_FILE_NAME 
/Users/she/casa/Documents/programas/pruebas/fluxsimulator/2_genes.gtf
PRO_FILE_NAME /Users/she/casa/Documents/programas/pruebas/fluxsimulator/soph.pro
LIB_FILE_NAME /Users/she/casa/Documents/programas/pruebas/fluxsimulator/soph.lib
SEQ_FILE_NAME /Users/she/casa/Documents/programas/pruebas/fluxsimulator/soph.bed
GEN_DIR /Users/she/Documents/programas/pruebas/fluxsimulator/chr
TMP_DIR /Users/she/Documents/programas/pruebas/fluxsimulator/tmp
NB_MOLECULES 500000
EXPRESSION_K -0.6
EXPRESSION_X0 5.0E7
EXPRESSION_X1 9500.0
TSS_MEAN 25.0
POLYA_SHAPE 2.0
POLYA_SCALE 300.0
RT_MIN 500
RT_MAX 5500
RT_PRIMER RANDOM
FRAGMENTATION NO
FRAG_B4_RT NO
FRAG_MODE PHYSICAL
FRAG_LAMBDA 900.0
FRAG_SIGMA 0.05
FRAG_THRESHOLD 0.1
FILTERING NO
LOAD_CODING YES
LOAD_NONCODING YES
FILT_MIN 200
FILT_MAX 250
READ_NUMBER 50000
READ_LENGTH 75
PAIRED_END YES
FASTQ YES
QTHOLD 33.0
ERR_FILE_NAME 
/Users/she/Documents/programas/pruebas/fluxsimulator/FluxSimulator/demo/error_mo
del.err

My error model was taken from the demo dataset

#MODEL 75 10000000
#CROSSTALK A
0.25 0.25 0.25 0 0.25
#CROSSTALK G
0.25 0.25 0.25 0 0.25
#CROSSTALK C
0.25 0.25 0.25 0 0.25
#CROSSTALK T
0.25 0.25 0.25 0 0.25

   1. PositionErrorProfile 26 1 0.1

Is there any minimum number of genes to define in the .gtf file? What's wrong 
in my input?

Thanks very much for your help.

Regards,

Sheila

Original issue reported on code.google.com by gmicha@gmail.com on 22 Jun 2010 at 12:28

GoogleCodeExporter commented 8 years ago
Hi Sheila,

there are different points here. 

(1) You do not find qualities in the output (i.e., a "fastq" file) as there are 
no qualities in the error model. With an error model alike the one in the 
"error_model_qualities.err" file from the demo package, the program should 
generate a fastq file.

(2) There are problems during the sequencing step. I tried to reproduce the 
problem on different platforms, but failed to do so. Find attached the files I 
used as sample input, and the output during the program run. If I interprete 
the part of program output you pasted correctly, the problem may also stem from 
the generated .lib file. 

To proceed in this issue, it may be quickest if we
- try whether the example "2_genes.gtf" I used works for you
- you provide me an example of the gtf input that failed for you
- you paste the complete program output, and maybe the ´head´ and ´tail´ of 
your intermediate .lib file

Best,

micha

Original comment by gmicha@gmail.com on 22 Jun 2010 at 12:54

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Although there is contradictory information, this issue may be related to the 
StackOverflow reported in Issue 38. Until contrary notification, it is assumed 
that sequencing 2-3 genes is possible now because the example posted by 
li.david.wei is adopting only a single gene.

Original comment by gmicha@gmail.com on 5 Jul 2010 at 8:54

GoogleCodeExporter commented 8 years ago
Issue 42 has been merged into this issue.

Original comment by gmicha@gmail.com on 5 Jul 2010 at 8:56

GoogleCodeExporter commented 8 years ago
Hi Micha,
Sorry for the late reply.

It might sound weird but with these files the application does run in Linux x64 
but not in MacOSX Snow Leopard, does it make sense to you?. 
I've been trying to play with the parameters and when I try to get a high 
number of reads (ex: 40 million paired-end reads) the program reports an error. 
Is there an upper limit in the generation of reads? The program was run in a 
CentOS 5.2 x64 server, 32Gb RAM, 8 cores. 
Here is my .par file:
REF_FILE_NAME /data/results/Sheila/FluxSimulator/sammeth/chr17.gtf
PRO_FILE_NAME /data/results/Sheila/FluxSimulator/sammeth/muchas.pro
LIB_FILE_NAME /data/results/Sheila/FluxSimulator/sammeth/muchas.lib
SEQ_FILE_NAME /data/results/Sheila/FluxSimulator/sammeth/muchas.bed
GEN_DIR /data/results/Sheila/FluxSimulator/chrs
NB_MOLECULES 500000000
EXPRESSION_K -0.6
EXPRESSION_X0 5.0E7
EXPRESSION_X1 9500.0
TSS_MEAN 25.0
POLYA_SHAPE 2.0
POLYA_SCALE 300.0
RT_MIN 100
RT_MAX 5500
RT_PRIMER RANDOM
FRAGMENTATION YES
FRAG_B4_RT YES
FRAG_MODE PHYSICAL
FRAG_LAMBDA 900.0
FRAG_SIGMA 0.05
FRAG_THRESHOLD 0.1
FILTERING YES
FILT_MIN 100
FILT_MAX 300
LOAD_CODING YES
LOAD_NONCODING YES
READ_NUMBER 40000000
READ_LENGTH 75
PAIRED_END YES
FASTQ YES
QTHOLD 33.0
ERR_FILE_NAME /data/results/Sheila/FluxSimulator/sammeth/err.err
TMP_DIR /data/results/Sheila/FluxSimulator/sammeth/tmp

Everything was working OK although it was taking to long to generate results. 
Indeed, the error message appeared a couple of days after I started the 
application. Please find attached the error file.

Thanks very much for your help and my apologies again for the late reply.

Regards,

Sheila

Original comment by szu...@gmail.com on 10 Jul 2010 at 4:00

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Micha,
I forgot to tell you:
In the Linux server:
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

In my MacOSX:
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-10M3025)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode)

I read already issue38 and checked for the last and the first coordinates of 
the exons in my GTF file. Both exons are far away from start and end of 
chromosome 17.

Best regards,

Sheila

Original comment by szu...@gmail.com on 10 Jul 2010 at 4:11