Closed pb-cdunn closed 8 years ago
Chris, I've silenced these in my local copy. I'm not uploading for a while however as I am in the middle of getting ready for the release of some new tools. In the next couple of months I anticipate releasing:
(a) Tools to soft mask tandem and ubiquitous interspersed repeats for the daligner (results in 10X speed improvement with little down side for current assemblers which can't resolve big (>= 15Kbp) repeats anyway).
(b) The scrubbing tools (at last ! )
(c) A QT-based viewer for looking at overlap piles, intrinsic quality values, and masks (i.e. the viewer I use to produce illustrations for my talks).
Cheers, Gene
On 3/23/16, 5:22 PM, Christopher Dunn wrote:
With |-Wall -Wextra|:
DB.c:882:15: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result] fscanf(istub,DB_NFILE,&nfiles); ^ DB.c:886:19: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result] { fscanf(istub,DB_FDATA,&last,fname,prolog); ^ DB.c:893:19: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result] { fscanf(istub,DB_FDATA,&last,fname,prolog); — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/14
No rush. Thanks for the updates. I'm eager to see A and B.
I hope you don't add a required dependency for C. Optional compiling/linking with QT is ok, but please don't force us. We'll probably have a different viewer for overlap piles anyway. (Overlap piles is one nice idea that we learned from your talk.)
For C, one will need to have the QT library installed. But this dependency will only exist for the viewer and not any other of the other dazzler modules.
Why not use it? Mine displays masks, intrinsic QV's, scrubbing marks, etc. Seems like a lot of redevelopment work just to have it in Java (which sucks performance-wise anyway).
Cheers,
Gene
On 3/28/16, 8:10 PM, Christopher Dunn wrote:
No rush. Thanks for the updates. I'm eager to see A and B.
I hope you don't add a required dependency for C. Optional compiling/linking with QT is ok, but please don't force us. We'll probably have a different viewer for overlap piles anyway. (Overlap piles is one nice idea that we learned from your talk.)
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/14#issuecomment-202513225
The reason not to use yours is the dependency on QT. For a stand-alone app, yes, QT is pretty sweet and good for performance, but there are other considerations. We might just want to dump a static picture. And these are management decisions, not mine. I'm sure I'll use yours for ad hoc analysis.
If it's a separate repo, or if we can build everything in the repo excluding the QT-dependent parts, then I'd happy. One thing I love about your code is that it's so easy to build.
OK, so you are going to embed a "pile viewer" into your SMRT-analysis
suite, yes? But how
is that possible unless you also have all the overlaps/LAs computed?
Have you integrated
daligner into the SMRT-suite? BTW, who does make the management
decisions in regards
to the development of your software suite? (Write me direct at
"gene.myers@gmail.com" if the
answers there in are sensitive).
My viewer is a stand alone app that reads Dazzler DB's and tracks as
input. I plan to put
all my code in the repository with a Makefile that requires you have the
appropriate QT libraries
installed (just like y'all need HD5 support installed). While I could
supply just the QT-independent
code, honestly in a GUI which is mostly concerned with display there is
very little of that despite
my best attempt to separate graphics and interaction from modeling :-)
Sorry.
Cheers, Gene
On 3/30/16, 6:04 PM, Christopher Dunn wrote:
The reason not to use yours is the dependency on QT. For a stand-alone app, yes, QT is pretty sweet and good for performance, but there are other considerations. We might just want to dump a static picture. And these are management decisions, not mine. I'm sure I'll use yours for ad hoc analysis.
If it's a separate repo, or if we can build everything in the repo /excluding/ the QT-dependent parts, then I'd happy. One thing I love about your code is that it's so easy to build.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/14#issuecomment-203504599
If it's a separate Makefile, no problem. If it's a separate top target in the Makefile, and we can simply make a different target to build dazzdb without this stand-alone QT app, that's fine too. If your Makefile forces us to build the QT app, ... well, we already use our own Makefile in our forks of DALIGNER and DAZZ_DB, so we'll be ok.
But I urge you not to force other users to link with QT. A pure ISO C code-base like yours is trivial to build. E.g. in DALIGNER Issue-35,a user has asked for your help in creating releases in Debian. I say that's not needed because Debain users can build DALIGNER themselves so easily. But if have dependencies, then versions matter, so people will want OS-specific installers.
And that's just one example. When you add dependencies, you can probably expect to get a lot of questions. It's much better to make that component optional somehow. Most questions we receive are related to building, rather than to functionality. That's for blasr, falcon, and other tools. We're shipping a giant suite of pre-built software with the new machine, which solves that problem (and creates new ones). But someone who just wants to use FALCON -- or just DALIGNER -- would not want to install a multi-GB tarball. If you want to see how much trouble we have with dependencies, look at MJ's new repo, designed to build the PacBio world for GitHub users:
To answer your questions:
Yes, exactly. I like nice clean, simple makes and I am striving for
that in the dazzler suite. Its part of the reason that I respond to all you notes about compiler warnings :-) I hate the warnings (and dumb compilers even more :-) )
So if daligner is deep in the bowels of Falcon and your SMRT suite,
it must mean you are building dazzler .db's and then probably decoding them and the .las files for down stream stages. It's really a shame that there's not more usage of dazzler .db's: for example, using Quiver for the consensus phase of assembly is currently a shambles in my view. Specifically, the assembler knows the position of every read in the assembly and hence its mapping to the coarse consensus it produces. Why then call BLASR to remap reads to this consensus when the location is already known? Why go back to a shitload of individual .bax or .bam files to get the quality streams when they can be randomly accessible in a compressed form from a single Dazzler .db? I'd actually do the re-engineering myself, but I can't as Quiver source is not available. As another example, I am about to release a new daligner variant specifically tuned to "mapping" that will obsolete BLASR as your mapping tool (unless you want to be 30-60X slower than necessary). To incorporate that into Quiver you will have to operate off of a Dazzler .db and not the .bax/.bam's (albeit since I release source you can re-engineer the interface). Sorry, a bit of rant, but I wanted to point out this sub-optimal situation, its been bugging me for a while.
On 3/31/16, 6:07 PM, Christopher Dunn wrote:
If it's a separate Makefile, no problem. If it's a separate top target in the Makefile, and we can simply make a different target to build dazzdb /without/ this stand-alone QT app, that's fine too. If your Makefile forces us to build the QT app, ... well, we already use our own Makefile in our forks of DALIGNER and DAZZ_DB, so we'll be ok.
But I urge you not to force other users to link with QT. A pure ISO C code-base like yours is trivial to build. E.g. in DALIGNER Issue-35,a user has asked for your help in creating releases in Debian. I say that's not needed because Debain users can build DALIGNER themselves so easily. But if have dependencies, then versions matter, so people will want OS-specific installers.
And that's just one example. When you add dependencies, you can probably expect to get a lot of questions. It's much better to make that component optional somehow. Most questions we receive are related to building, rather than to functionality. That's for blasr, falcon, and other tools. We're shipping a giant suite of pre-built software with the new machine, which solves that problem (and creates new ones). But someone who just wants to use FALCON -- or just DALIGNER -- would not want to install a multi-GB tarball. If you want to see how much trouble we have with dependencies, look at MJ's new repo, designed to build the PacBio world for GitHub users:
To answer your questions:
- The GUI is just a web-browser. It drives and interacts with many different workflows.
- Daligner is run several layers under the GUI, within falcon and ICE, but we could run it more directly too if necessary.
- There are several managers as well as marketing folks who would be involved. And now that a version of the GUI exists, replacing it with a stand-alone app (which in theory could link with your QT code) would be a large project, unlikely to happen soon.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/14#issuecomment-204002024
Thanks for the detailed response, and I agree completely. We do use those files in a few other places within FALCON. E.g. we parse DBshow to select a cutoff length for our seed-reads, and we use the .las files for seed-read correction, both in dazcon and in fc_consensus. We have discussed replacing blasr someday and would welcome your new daligner variant. We plan to work much more closer with Quiver (or some similar tool) for diploid distinction (or "phasing"), so amending that could be in the cards fairly soon.
Of course, we need to sell the new machine to stay in business, and the latest Quiver -- called Arrow -- takes advantage of extra data and metadata from the primary processing of Sequel data. The exact details are beyond my expertise at the moment. If your team replaced blasr/Quiver for RS2 data, I don't think anyone would mind.
At the moment, my main goal is to tweak FALCON so that it produces assemblies as long and contiguous as the Celera WGS suite. It's definitely much faster, but it sometimes falls a little short on some genomes. If you'd like to discuss that further, please e-mail me privately, and I'll include Jim or Jason.
Quiver source code turns out to be available, just under a different name:
Wow, thanks, that's cool and makes me feel a lot better !
Perhaps this suggests that y'all make it a little easier to find the source?
Cheers, G
On 4/6/16, 8:05 PM, Christopher Dunn wrote:
Quiver source code turns out to be available, just under a different name:
- https://github.com/PacificBiosciences/GenomicConsensus
- https://github.com/PacificBiosciences/ConsensusCore
- https://github.com/PacificBiosciences/ConsensusCore2 (for the Sequel-based Arrow)
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/14#issuecomment-206494145
With
-Wall -Wextra
: