Closed GoogleCodeExporter closed 8 years ago
Ah ok. I saw at the beginning of the output it is using SANGER FASTQ.
Original comment by 000.Cala...@googlemail.com
on 25 May 2012 at 9:14
Hi Markus,
Re: SMALT. That's a good question.
We are working to increase the speed of the alignment steps without reducing
the sensitivity to indels and finding new junctions, which are the strengths of
breseq. We've experimented with using SMALT in the past. However, when I last
checked, it did some things like not align indels the same way depending on
what strand the read was from, that would have made it necessary to
post-process the alignments. So, it didn't work as an easy drop-in.
Our strategy is probably going to be to use BWA to align perfect matches with
high mapping scores first and then align the remaining reads with SSAHA2.
--Jeff
Original comment by jeffrey....@gmail.com
on 26 May 2012 at 4:27
Re: FASTQ
breseq should be able to recognize the format of your input FASTQ correctly, no
matter how it starts. It then converts each FASTQ in the 01_sequence_conversion
directory to SANGER format, including renaming the reads (because sometimes
paired end reads have the same name, which confuses things, or other name
formatting can cause problems for SSAHA2 alignment).
Re: @SEQ_ID and +SEQ_ID lines
The test data packaged with breseq that we use for consistency tests doesn't
repeat the name on the + line. See tests/data/*. So, something else must be
causing your problem. Let us know an we'll look into it.
Original comment by jeffrey....@gmail.com
on 26 May 2012 at 4:32
Forgot to mention that if there are certain parts of the pipeline that you'd
like us to expose so that you can use your own aligned reads in BAM format,
then that's certainly possible. We could probably do any part except the
junction prediction part, which requires a subsequent alignment step with
SSAHA2.
Original comment by jeffrey....@gmail.com
on 26 May 2012 at 5:31
Hi Jeff,
thanks for your very informative reply.
Re: @SEQ_ID and +SEQ_ID lines
We are looking into that FASTQ header problem... maybe the header wasn't the
problem, although the first obvious difference between the two files. Maybe it
wasn't a FASTQ problem but a memory/disk space problem...
Re: SMALT
I see...
Re: Hybrid mapping
I would like to point to you to STAMPY
(http://genome.cshlp.org/content/21/6/936), which already has a hybrid mapping
approach in mind, i.e. you can pass BWA as a premapper. STAMPY itself is better
at mapping diverged sequences. Maybe it might be useful for you as an
alternative to SSAHA2.
We are waiting for our additional sequence data and will report back if we run
into problems.
Cheers,
Markus
Original comment by 000.Cala...@googlemail.com
on 29 May 2012 at 10:05
We have added an ability to use SAM files aligned by other programs to the next
version 0.19.
Original comment by jeffrey....@gmail.com
on 22 Jun 2012 at 1:46
Great work!
Markus
Original comment by 000.Cala...@googlemail.com
on 26 Jun 2012 at 10:36
If you'd like to try it out before we release a new "official" version you can
download from here:
http://barricklab.org/release/breseq/breseq-0.18b.tar.gz
To use SAM files as input pass the flag (--aligned-sam). It will assume that
you are inputing SAM files of aligned reads in place of where the FASTQ files
would normally be provided.
Example:
breseq --aligned-sam -r reference aligned_reads.1.sam aligned_reads.2.sam ...
Original comment by jeffrey....@gmail.com
on 26 Jun 2012 at 1:03
Original issue reported on code.google.com by
000.Cala...@googlemail.com
on 25 May 2012 at 7:37