Closed GoogleCodeExporter closed 8 years ago
Hi Xiaoping,
I'll take a look at the fragments error tomorrow.
Cheers,
// Michael
Original comment by snowneb...@gmail.com
on 18 Oct 2009 at 5:38
Dear Michael:
The bug was found! Specifically, your MosaikDupSnoop generated a SQLite3
database
with the four tables:
OrphanFragments PairedFragments ReadGroups SingleFragments
However, MosaikSort was looking for the table "Fragments" (it should look
for "PairedFragments") in duplicate filtering. This was why the error message
is "no
such table: Fragments". When I manually created a table "Fragments"
from "PairedFragments" in the database, MosaikSort was able to remove duplicate
fragments.
However, I still hope to remind you that your MosaikDupSnoop has been extremely
and
weirdly slow to generate the database with 7 lanes illumina data (about 50M
paired-
end reads). Indeed, I have to abort the MosaikDupSnoop after more than 48
hours.
Thus, it would be very helpful if you could take a look at the speed issue of
your
MosaikDupSnoop since duplicate removal is an extremely important issue for SNP
and
Indel detection.
I appreciate your great work and help greatly!
Xiaoping
Original comment by xiaoping...@stjude.org
on 23 Oct 2009 at 2:52
Thanks Xiaoping,
I had forgotten to make the necessary changes after one of my 1000 Genomes
project
tests. Thanks for reminding me - I'll make the necessary changes today.
The sqlite3 database is very disk I/O intensive. So the only recommendation I
can
make is to try running it on fast, local hard disks as opposed to network
storage.
MosaikDupSnoop was designed this way so that it could take into consideration an
entire directory of Mosaik alignment archives. I could make a version that
handles
one specific file (perhaps merged) in a much quicker way. I'll look into it
during
the weekend.
Thanks!
// Michael
Original comment by snowneb...@gmail.com
on 23 Oct 2009 at 3:00
Dear Michael:
It's great to know that the slow speed was caused by network storage. I will
definitely try running it on local fast disks today.
However, it's very hard to get around the network storage once more samples are
getting sequenced. Thus, I will really appreciate it if you could make a
version
that handles one specific merged file in a much quicker way.
Again, I hope to express my sincere gratitude to you for your help!
Xiaoping
Original comment by xiaoping...@stjude.org
on 23 Oct 2009 at 3:21
Michael and Xiaoping,
When I try to use DupSnoop to inspect a specific alignment archive it does not
generate a "sequencing library". After reviewing these posst I am obviously
doing
something wrong(no surprise). I either don't understand what is meant by
"sequencing
library" in this context or something else is going on, specifically when
dupsnoop
states "Databases for the following libraries will be created:" there are no
databases created.
I have pasted the the command line I used for dupsnoop:
MosaikDupSnoop -in 33105n.bin.aligned -od fragData/
------------------------------------------------------------------------------
MosaikDupSnoop 1.0.1307 2009-10-14
Michael Stromberg Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------
- resolving the following types of read pairs: [unique orphans] [unique vs
unique]
[unique vs multiple]
Scanning the following alignment archives:
- 33105n.bin.aligned
Databases for the following libraries will be created:
-
Creating databases... finished.
Parsing 33105n.bin.aligned:
- recording unique read lengths:
100%[=====================================] 53,401.1 reads/s in 01:29
Consolidating fragments from library:
- no paired-end fragments found. skipping library.
Thanks,
Jeff
Original comment by jstevens...@gmail.com
on 23 Oct 2009 at 4:34
Hi, Jeff:
Your problem was caused by how you used MosaikBuild. Specifically, you have to
specify the library name (-ln) when you try to build the binary reads file by
using
MosaikBuild. Without "library name", DupSnoop has trouble creating a database
file
name in the folder (e.g. fragData).
Cheers!
Xiaoping
Original comment by xiaoping...@stjude.org
on 23 Oct 2009 at 5:19
Xiaoping,
Thanks!
I'll try that.
Jeff
Original comment by jstevens...@gmail.com
on 23 Oct 2009 at 5:27
Dear Michael:
I ran MosaikDupSnoop on 64-bit Window PC with 32GB memory, 500GB local hard
disk,
and 4 processors. And I was able to successfully perform DupSnoop for 7 lanes
transcriptome illumina data (the alignment file size is about 9GB with
single-end
reads setting, and there are lots of PCR artifacts from RNA-seq) within three
hours,
which is perfectly fine. The sqlite3 database file size is about 5GB.
However, when I ran DupSnoop for 14 lanes genome illumina data (the alignment
file
size is about 20GB with single-end reads setting with alignment mode=unique) on
same
64-bit PC, DupSnoop was not able to finish. Specifically, DupSnoop was stuck
with
0.000 reads/s after DupSnoop generated a sqlite3 database file with size=14GB
and
90% alignment file getting analyzed.
Thanks very much!
Xiaoping
Original comment by xiaoping...@stjude.org
on 28 Oct 2009 at 2:34
Hi Xiaoping,
I'll try to look into the performance issue of DupSnoop at a later date.
Since the original bug was fixed with the table names, I'll close this bug
report.
Cheers,
// Michael
Original comment by snowneb...@gmail.com
on 16 Jan 2010 at 2:09
Original issue reported on code.google.com by
xiaoping...@stjude.org
on 18 Oct 2009 at 5:22