sc932 / ALE

Assembly Likelihood Estimator
Other
32 stars 7 forks source link

Mapping reads #13

Closed HTaekOppa closed 6 years ago

HTaekOppa commented 6 years ago

Hi,

Thank you for the ALE. Before testing it on my data, I would like to ask you a few questions about the ALE. According to the manual, it is required to generate sam or bam files from mapping programs (BWA, Bowtie2 etc.,). Let’s say I have two different fasta files generated from pure PacBio and Hybrid (PacBio + Illumina).

  1. pure PacBio fasta: Can I use both Illumina (PE/MP) and PacBio reads (as fastQ) for mapping? If yes, can I generate it for each dataset, respectively (e.g. readSorted.PacBio.bam and Illumina.bam) and execute the command like this:

./ALE [-options] readSorted.PacBio.bam readSorted.Illumina.bam PacBio_assembly.fasta ALEoutput.ale

Or do I have to pick only one readSorted.bam file for this?

  1. Hybrid fasta: This assembly comes from Illumina (PE) and PacBio reads. Same question for mapping? Can I use both Illumina (PE/MP) and PacBio reads (as fastQ) for mapping? Or do I have to pick only one readSorted.bam file for this?

Looking forward to your advice on this matter!

Cheers,

Taek

robegan21 commented 6 years ago

Sorry for the late reply. Please read the paper about how best to use ALE and interpret the scores. ALE can evaluate different assemblies of the same data by mapping a set of reads to two different fasta files. ALE can also evaluate different mappers by mapping a single set of reads to a single fasta file with two different mappers. Reads can be short (illumina) or long reads (pacbio/nanopore), and ALE should have multiple bam files if each set of reads is from a different source, as certain statistics such as insert size are evaluated fore each bam file separately.