guanchangge / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

MosaikAssembler: An attempt was made to get reads from an alignment archive that hasn't been opened yet. #3

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.MosaikAssembler to whole human genome
2.
3.

What is the expected output? What do you see instead?
Assembler before would align to each chromosome with separate gig outputs 
for each chromosome, now with version 1.0 I get this error 
 MosaikAssembler -in 33105n.u.bin.aligned.sorted -ia hg18n.fa.bin -out 
Gig/33105n.u.bin.aligned.sorted.assembled -f gig
---------------------------------------------------------------------------

---
MosaikAssembler 1.0.1307                                            2009-
10-14
Michael Stromberg                 Marth Lab, Boston College Biology 
Department
---------------------------------------------------------------------------

---

===============================================================
alignment count   reference sequence
---------------------------------------------------------------
          52817   chr10
             32   chr10_random
          46263   chr11
             14   chr11_random
          38920   chr12
          29527   chr13
             46   chr13_random
         658857   chr14
          38291   chr15
             10   chr15_random
          50933   chr16
          37725   chr17
            750   chr17_random
          27843   chr18
              5   chr18_random
          31706   chr19
             21   chr19_random
          76064   chr1
             20   chr1_random
          31990   chr20
          22869   chr21
            145   chr21_random
          22975   chr22
              6   chr22_random
          76752   chr2
          40755   chr3
         836380   chr4
            304   chr4_random
          47825   chr5
              3   chr5_h2_hap1
          46724   chr6
           1483   chr6_random
             12   chr6_cox_hap1
              6   chr6_qbl_hap2
          60453   chr7
             51   chr7_random
          48045   chr8
             25   chr8_random
          38813   chr9
            158   chr9_random
            455   chrM
          14632   chrX
             26   chrX_random
           2091   chrY

Processing reference sequence chr10:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from chr10:
100%[================================]  52,817.0 alignments/s        in  1 
s  

- appending read index to read data... finished.

Processing reference sequence chr10_random:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from chr10_random:
ERROR: An attempt was made to get reads from an alignment archive that 
hasn't been opened yet.
 0% [                                ] 

If I specify just Chr 4 I get this error:MosaikAssembler -in 
33105n.bin.aligned.sorted -ia hg18n.fa.bin -out 
Gig/33105n.bin.aligned.sorted.assembled -f gig -roi chr4
---------------------------------------------------------------------------

---
MosaikAssembler 1.0.1307                                            2009-
10-14
Michael Stromberg                 Marth Lab, Boston College Biology 
Department
---------------------------------------------------------------------------

---

===============================================================
alignment count   reference sequence
---------------------------------------------------------------
         836380   chr4

Processing reference sequence chr4:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... ERROR: Expected more 
reads to assemble.

What version of the product are you using? On what operating system?
Version 1.0, linux 64 bit/ Redhat OS

Please provide any additional information below.

Original issue reported on code.google.com by jstevens...@gmail.com on 16 Oct 2009 at 9:00

GoogleCodeExporter commented 8 years ago
Thanks Jeff. I'll see if I can reproduce this error during the weekend.

Original comment by snowneb...@gmail.com on 16 Oct 2009 at 9:18

GoogleCodeExporter commented 8 years ago
Great! 
I can give you a little more info on what we did on our end. Initially we tried 
to 
use the fa.bin file for hg18 that we created with the last version of Mosaik. 
It 
didn't like it so we recreated it in version 1.0. We did the same for our 
Illumina 
read file. From this point on everything proceeded as usual up to the 
MosaikAssembler 
step.
The read file consists of 76bp reads from a sequence capture protocol which 
contains 
sequence from a region on chr4 and a region on chr14. Using this technique you 
also 
capture non specific/unique sequences throughout the genome, so the alignment 
will 
produce spotty alignments throughout but good alignment for the targeted area.

The second error: "- locating first read for this reference sequence... ERROR: 
Expected more reads to assemble" makes me wonder if this has something to do 
with 
this issue.
Your previous version handled this beautifully.
Let me know if I can provide more info or files to test

Cheers

Original comment by jstevens...@gmail.com on 16 Oct 2009 at 9:35

GoogleCodeExporter commented 8 years ago
I have exactly same error message with this new version of MosaikAssembler. And 
I 
used the centOS 64-bit linux cluster.

Thanks!

Xiaoping

Original comment by xiaoping...@stjude.org on 18 Oct 2009 at 12:51

GoogleCodeExporter commented 8 years ago
Michael,
I sent you an SFTP link on your email to obtain that aligned file

Jeff 

Original comment by jstevens...@gmail.com on 19 Oct 2009 at 7:09

GoogleCodeExporter commented 8 years ago
Just the same problem:
Processing reference sequence 0:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 0:
100%[===========================================================================
=======================================================]
    446.0 alignments/s        in  1 s  

- appending read index to read data... finished.

Processing reference sequence 5:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 5:
 0% [                                                                               
                                                  ]                                 
     |ERROR: An attempt was made to get reads from an alignment archive that hasn't
been opened yet.
linux 64 operating system

Original comment by csf...@yahoo.com.cn on 20 Oct 2009 at 12:37

GoogleCodeExporter commented 8 years ago
Thanks for the update. 

I received a file from Jeff yesterday, so I'll use that today to perform some 
debugging.

// Michael

Original comment by snowneb...@gmail.com on 20 Oct 2009 at 12:50

GoogleCodeExporter commented 8 years ago
I was able to successfully reproduce the error. Tomorrow I will debug the
MosaikAssembler code to see where the error is occurring.

// Michael

Original comment by snowneb...@gmail.com on 23 Oct 2009 at 3:49

GoogleCodeExporter commented 8 years ago
Michael,
Great, Thanks.
Cheers,
jeff

Original comment by jstevens...@gmail.com on 23 Oct 2009 at 4:18

GoogleCodeExporter commented 8 years ago
Hi guys,

The MosaikAssembler bugs have now been fixed. One bug was caused by the 
alignment
archive being prematurely closed after processing the first reference sequence. 
The
second bug was caused by the outdated indexing routines.

An update of MOSAIK will be up on the site in a couple of days, until then you 
can
get the fix through subversion or you can get the 64-bit linux binary from the
following link:

http://bioinformatics.bc.edu/~mikaels/Mosaik/Mosaik-1.0-Linux-x64.tar.bz2

NOTE: Since the indexing routines were updated, you will have to re-run 
MosaikSort to
create a fresh index in your alignment archive.

Let me know if the fix works for you.

[mikaels@humu MosaikAssemblerBug]$ 
/home/mikaels/source/Mosaik/bin/MosaikAssembler
-in ERR001719_sorted.dat -ia h.sapiens_1000G_female.dat -out ERR001719
------------------------------------------------------------------------------
MosaikAssembler 1.0.1342                                            2009-10-24
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

===============================================================
alignment count   reference sequence
---------------------------------------------------------------
         732678   1
         779738   2
         647111   3
         598241   4
         579398   5
         549254   6
         475897   7
         470386   8
         361615   9
         426580   10
         426701   11
         425339   12
         311768   13
         285927   14
         257326   15
         242018   16
         231661   17
         255847   18
         140714   19
         203326   20
         110259   21
         100445   22
         440773   X
              4   NT_113887
             36   NT_113947
              2   NT_113908
             19   NT_113940
              7   NT_113963
              1   NT_113950
              5   NT_113907
             47   NT_113937
             73   NT_113941
             36   NT_113921
              2   NT_113960
             21   NT_113928
              1   NT_113966
             39   NT_113943
            266   NT_113914
            211   NT_113948
            285   NT_113886
              1   NT_113932
              4   NT_113929
              4   NT_113878
              4   NT_113900
             24   NT_113918
             66   NT_113942
            119   NT_113934
             66   NT_113954
             35   NT_113953
              2   NT_113874
              4   NT_113924
             44   NT_113933
              6   NT_113870
              6   NT_113939
             68   NT_113956
              4   NT_113951
            255   NT_113913
            431   NT_113958
             35   NT_113949
             90   NT_113889
              1   NT_113936
             28   NT_113957
            126   NT_113961
              4   NT_113925
            216   NT_113916
           1112   NT_113930
             33   NT_113955
             31   NT_113944
           1232   NT_113901
             87   NT_113905
              8   NT_113872
            191   NT_113952
              2   NT_113912
              2   NT_113935
              6   NT_113931
             98   NT_113923
            246   NT_113885
             42   NT_113888
              6   NT_113871
              2   NT_113910
            216   NT_113899
              9   NT_113965
            577   NT_113898
          12410   NC_007605
              4   ALU.ALUSC
              2   ALU.ALUSG
              2   ALU.ALUSP
              4   ALU.ALUSQ

Processing reference sequence 1:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 1:
100%[================================] 244,226.0 alignments/s        in  3 s

- appending read data to header data... finished.

Processing reference sequence 2:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 2:
100%[================================] 259,912.7 alignments/s        in  3 s

- appending read data to header data... finished.

Processing reference sequence 3:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 3:
100%[================================] 215,631.8 alignments/s        in  3 s

- appending read data to header data... finished.

Processing reference sequence 4:
- inserting gaps into reference sequence... finished.
- creating ungapped to gapped conversion table... finished.
- writing assembly header... finished.
- locating first read for this reference sequence... finished.

- saving alignments from 4:
100%[================================] 239,200.7 alignments/s        in  2 s

- appending read data to header data... finished.

[remaining output snipped out]

MosaikAssembler CPU time: 241.750 s, wall time: 305.914 s

// Michael

Original comment by snowneb...@gmail.com on 24 Oct 2009 at 4:28

GoogleCodeExporter commented 8 years ago
Dear Michael:

Your new MosaikAssembler worked perfectly. 

Thanks very much for your great work and quick fixing!

Xiaoping

Original comment by xiaoping...@stjude.org on 24 Oct 2009 at 9:12

GoogleCodeExporter commented 8 years ago

Original comment by snowneb...@gmail.com on 24 Oct 2009 at 9:35

GoogleCodeExporter commented 8 years ago
Michael,
Thanks!!!!

Cheers,
Jeff

Original comment by jstevens...@gmail.com on 26 Oct 2009 at 3:58

GoogleCodeExporter commented 8 years ago
Issue 6 has been merged into this issue.

Original comment by snowneb...@gmail.com on 27 Oct 2009 at 8:52