sivarajankumar / fluxcapacitor

Automatically exported from code.google.com/p/fluxcapacitor
0 stars 0 forks source link

Problems during sequencing #69

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
An anonymous user added a new forum post "most transcripts and many genes
have no reads" in thread "most transcripts and many genes have no reads" at
http://fluxcapacitor.wikidot.com/forum/t-409308#post-1301733

Hi everyone,

I have a problem with read simulation. I have defined a .pro file where all
transcripts have very similar abundance.

1:95302304-95320982C    gene_123_iso_1  CDS 697 0.003577    114
1:95308664-95320982C    gene_123_iso_2  CDS 670 0.003138    100
1:203830743-203839678W  gene_1180_iso_1 CDS 942 0.003326    106
1:203832753-203839179W  gene_1180_iso_2 CDS 355 0.003138    100
1:203830731-203839209W  gene_1180_iso_3 CDS 967 0.003169    101
1:203830731-203839212W  gene_1180_iso_4 CDS 532 0.003514    112
1:203830737-203839205W  gene_1180_iso_5 CDS 478 0.002699    86
1:203830978-203834247W  gene_1180_iso_6 CDS 248 0.002730    87

after library preparation I get read numbers that make me perfectly happy:

1:95302304-95320982C    gene_123_iso_1  CDS 697 0.003577    114
0.00209560367060517 1747
1:95308664-95320982C    gene_123_iso_2  CDS 670 0.003138    100
0.0017033527259641336   1420
1:203830743-203839678W  gene_1180_iso_1 CDS 942 0.003326    106
0.002295927547531938    1914
1:203832753-203839179W  gene_1180_iso_2 CDS 355 0.003138    100
0.001194745996521322    996
1:203830731-203839209W  gene_1180_iso_3 CDS 967 0.003169    101
0.0022443471480837283   1871
1:203830731-203839212W  gene_1180_iso_4 CDS 532 0.003514    112
0.001631380075571283    1360
1:203830737-203839205W  gene_1180_iso_5 CDS 478 0.002699    86
0.0012043423499070354   1004
1:203830978-203834247W  gene_1180_iso_6 CDS 248 0.002730    87
8.348827445570684E-4    696

but after sequencing I get this:

1:95302304-95320982C    gene_123_iso_1  CDS 697 0.003577    114
0.00209560367060517 0.011101079589527092    1746
1:95308664-95320982C    gene_123_iso_2  CDS 670 0.003138    100
0.0017033527259641336   0.0 0
1:203830743-203839678W  gene_1180_iso_1 CDS 942 0.003326    106
0.002295927547531938    0.0 0
1:203832753-203839179W  gene_1180_iso_2 CDS 355 0.003138    100
0.001194745996521322    0.0 0
1:203830731-203839209W  gene_1180_iso_3 CDS 967 0.003169    101
0.0022443471480837283   0.0 0
1:203830731-203839212W  gene_1180_iso_4 CDS 532 0.003514    112
0.001631380075571283    0.0 0
1:203830737-203839205W  gene_1180_iso_5 CDS 478 0.002699    86
0.0012043423499070354   0.0 0
1:203830978-203834247W  gene_1180_iso_6 CDS 248 0.002730    87
8.348827445570684E-4    0.0 0

most of the transcripts have no reads assigned.
What am I doing wrong?

This is my parameter file:

REF_FILE_NAME   genes.gtf
PRO_FILE_NAME   genes_expr_weak_bias1.pro
LIB_FILE_NAME   genes_expr_weak_bias1.lib
SEQ_FILE_NAME   genes_expr_weak_bias1.bed
GEN_DIR hg19/
NB_MOLECULES    20000000
EXPRESSION_K    -0.6
EXPRESSION_X0   5.0E7
EXPRESSION_X1   9500.0
RT_MIN  10
RT_MAX  10000
FRAGMENTATION   YES
LOAD_CODING YES
LOAD_NONCODING  YES
FILTERING   NO
READ_NUMBER 10000000
READ_LENGTH 75
PAIRED_END  YES
TMP_DIR /tmp/global2/data_sim/
POLYA_SHAPE 2
POLYA_SCALE 300
ERR_FILE_NAME   genes_expr_weak_bias1.err
RT_PRIMER   RANDOM
FRAG_B4_RT  YES
FRAG_MODE   CHEMICAL
FRAG_LAMBDA 500.0
FASTQ   NO
QTHOLD  0.0
FRAG_SIGMA  5.000000e-02
FRAG_THRESHOLD  1.000000e-01

I am using an older version of the flux simulator (built 20101223), because
the new versions (4 and 5) died
during library generation with an null pointer exception:

[LIBRARY] Configuration
               Rounds: 15
               Mean: 0.5
               Standard Deviation: 0.1

       Processing Fragments * FAILED
[ERROR] Error while fragmenting : null
java.lang.NullPointerException
       at
fbi.genome.sequencing.rnaseq.simulation.fragmentation.Amplification.getGCcontent
(Amplification.java:135)
       at
fbi.genome.sequencing.rnaseq.simulation.fragmentation.Amplification.process(Ampl
ification.java:100)
       at
fbi.genome.sequencing.rnaseq.simulation.fragmentation.Fragmenter.process(Fragmen
ter.java:545)
       at
fbi.genome.sequencing.rnaseq.simulation.fragmentation.Fragmenter.call(Fragmenter
.java:245)
       at
fbi.genome.sequencing.rnaseq.simulation.SimulationPipeline.call(SimulationPipeli
ne.java:339)
       at
fbi.genome.sequencing.rnaseq.simulation.SimulationPipeline.call(SimulationPipeli
ne.java:32)
       at fbi.commons.flux.Flux.main(Flux.java:182)

any hints on that would also be helpful.

Thanks in advance,
Jonas

Original issue reported on code.google.com by gmicha@gmail.com on 9 Nov 2011 at 8:34

GoogleCodeExporter commented 8 years ago
hi,

its hard to say what is going on for the old version. we changed a lot of 
things. Therefore the interesting problem seems the be the 
NullPointerException. 

From what I got from the stacktrace, it looks like a problem related to the GTF 
(REF_FILE_NAME) file and the ID's specified in that file. Internally we create 
a map to identify the entries, and somehow, the lookup fails for one of the 
ID's specified in your gtf. 

It would be great if you could attach a version of your GTF so I can try to 
reproduce and fix the problem. If that is running, we can check the results 
again and I can go into further detail.

Original comment by thasso.g...@gmail.com on 11 Nov 2011 at 7:56