mahulchak / svmu

A program to call variants from genome alignment
GNU General Public License v3.0
75 stars 18 forks source link

Loss of chr information in cm.lastz.txt and sv.lastz.txt #16

Open clementmo opened 4 years ago

clementmo commented 4 years ago

Hello mahulchak, After I run svmu with a genome with two reference genomes, the out put of cm.lastz.txt and sv.lastz.txt shows no complete chromosomes for one reference genome .

The code for lastz and svmu are the same, but different in nucmer, as shown in the following.

NUCMER for refer1

nucmer --maxmatch --noextend --prefix=Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa

NUCMER for refer2

nucmer --prefix=Sample_ref2 ../reference_genome2.fasta ../sample.genomic.fa

lastz

/path/to/lastz//lastz-distrib-1.04.03/src/lastz_32 ../reference_genome1.fasta[multiple] ../sample.genomic.fa --notransition --step=20 --nogapped --progress=1 --format=maf > lastzSample_ref1.txt

svmu

/path/to/svmu-master/svmu Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa 5 l lastzSample_ref1.txt svmuSample_ref1

The output for reference1 2.4K Aug 7 17:43 cm.lastzAD3_AD1.txt.txt 2.6G Aug 7 18:11 cnv_all.lastzAD3_AD1.txt.txt 2.4G Aug 7 18:11 cords.lastzAD3_AD1.txt.txt 0 Aug 7 18:11 small.lastzAD3_AD1.txt.txt 3.2K Aug 7 18:11 sv.lastzAD3_AD1.txt.txt

The output for reference2 3.7M Jul 19 02:59 cm.lastzAD3_AD2.txt.txt 631M Jul 19 03:02 cnv_all.lastzAD3_AD2.txt.txt 648M Jul 19 03:02 cords.lastzAD3_AD2.txt.txt 0 Jul 19 03:02 small.lastzAD3_AD2.txt.txt 11M Jul 19 03:37 sv.lastzAD3_AD2.txt.txt

The big difference is the file size for reference1, files cm.lastzAD3_AD1.txt.txt and sv.lastzAD3_AD1.txt.txt are small and there is no complete information for all the chromosomes. And cnv_all and cords are large. Actually, reference1 and reference2 genome size are near and sequence have a little high similarity. How could the result will be such different changing NUCMER parameters? How can I define accurate SVs for the sample with appropriate parameters?

Looking forward to hearing from you. Thanks a lot! Best wishes, Clement

mahulchak commented 4 years ago

Hi Clement,

I think the ref1 run did not complete properly. You can run both without -maxmatch. I have also pushed an updated today. Not sure if that'll be of any help to you but feel free to try the updated version. The new version works better when the maxmatch option is NOT used.

clementmo commented 4 years ago

Hi Mahulchak, Thank you for your reply. The above mentioned reference genomes are larger than 2Gb. According to your updated information, the "-maxmatch " might be only suitable for small genomes, right? So I abandoned this parameters in later use. Thanks again!

mahulchak commented 4 years ago

yes. --maxmatch and --noextend may not work well for large genomes.