philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
293 stars 40 forks source link

ngmlr running problem #59

Open Nickerson-Cool opened 5 years ago

Nickerson-Cool commented 5 years ago

Hi philres: I was running a project recently. The reference sequence is MSU, and reads is Pacbio data that has been assembled to the genome.Now,i meet a problem that the project have running 20 days(threads=6) and i found the result file is too large(3.3G) .Should i use the raw sequencing reads data to run? The result file is as follow:

@HD VN:1.0 SO:unsorted @SQ SN:Chr1 LN:44361539 @SQ SN:Chr2 LN:37764328 @SQ SN:Chr3 LN:39691490 @SQ SN:Chr4 LN:35849732 @SQ SN:Chr5 LN:31237231 @SQ SN:Chr6 LN:32465040 @SQ SN:Chr7 LN:30277827 @SQ SN:Chr8 LN:29952003 @SQ SN:Chr9 LN:24760661 @SQ SN:Chr10 LN:25582588 @SQ SN:Chr11 LN:31778392 @SQ SN:Chr12 LN:26601357 @SQ SN:chrM LN:527116 @SQ SN:chrC LN:134546 @PG ID:ngmlr PN:nextgenmap-lr VN:0.2.7 CL:ngmlr -t 6 -r MSU.fasta -q assembly.fasta -o msu_R1.sam Chr1 4 0 0 * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACAGCTG.............

my command is as follow: ngmlr -t 6 -r MSU.fasta -q assembly.fasta -o msu_R1.sam

Am i running rigth?

fritzsedlazeck commented 5 years ago

Hi, so you are currently aligning the assembly to the reference? What are you looking for? Is it variation? Then the raw reads are typically better. We have not optimized NGMLR for genomic alignments. There are other methods that might work better such as MUMmer or Minimap2.

May I ask how large the genome is? Thanks Fritz

Nickerson-Cool commented 5 years ago

Hi philres: Thank you for your immediately replay. My assembly genome(Chromosome level) is nearly 400Mb. And the reference(350Mb) is also Chromosome level. I want to align the PacBio assembly to the reference then look for the variation. If i continue to using NGMLR ,now i should use the raws reads ?

fritzsedlazeck commented 5 years ago

Yeah for whole genome alignment I usually use MUMmer or you can use minimap2. The latter provides a sam format too.

You will have greater sensitivity and full genome coverage by using the raw pacbio reads. Cheers Fritz