hewm2008 / NGenomeSyn

Any Way to Show Multi genomic Synteny
MIT License
154 stars 17 forks source link

Link file not generated when using Paf2Link in GetTwoGenomeSyn.pl #6

Closed PerDoloremAdAstra closed 11 months ago

PerDoloremAdAstra commented 11 months ago

Hi, thank you so much for creating this software.

I just came across this issue when I was running GetTwoGenomeSyn.pl. The ".paf" file was generated successfully, but the ".link" file was empty, and I got this log file with error message in it:

[M::mm_idx_gen::10.919*1.45] collected minimizers
[M::mm_idx_gen::12.171*1.83] sorted minimizers
[M::main::12.171*1.83] loaded/built the index for 6 target sequence(s)
[M::mm_mapopt_update::13.108*1.77] mid_occ = 108
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 6
[M::mm_idx_stat::13.904*1.72] distinct minimizers: 55226085 (95.11% are singletons); average occurrences: 1.254; average spacing: 9.967
[M::worker_pipeline::23.260*2.05] mapped 3 sequences
[M::worker_pipeline::29.155*2.12] mapped 4 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: /home/miniconda3/pkgs/minimap2-2.17-h5bf99c6_4/bin/minimap2 -x asm5 -t 6 refA2refB.B.fa refA2refB.A.fa
[M::main] Real time: 29.273 sec; CPU: 61.832 sec; Peak RSS: 7.204 GB
Warining: SVG module in Perl is missing, trying to loading the built-in [SVG.pm]...
Use of uninitialized value $_ in scalar chomp at /home/software/NGenomeSyn/bin/NGenomeSyn line 1589.
Use of uninitialized value $_ in split at /home/software/NGenomeSyn/bin/NGenomeSyn line 1589.
Loading SVG module done
Error:  InPut Genome Link 1-2 Info refA2refB.link file  Format wrong,pleas check it
    ALL done, see the xxx.png . you can optimized drawing by [NGenomeSyn] software
         optimized: [Filter] and [Merge] small syn blocks to big syn block

The command that I used is:

perl /home/software/NGenomeSyn/bin/GetTwoGenomeSyn.pl \
-InGenomeA refA.fna.gz \
-InGenomeB refB.fna.gz \
-MappingBin  minimap2 \
-BinDir /home/miniconda3/pkgs/minimap2-2.17-h5bf99c6_4/bin \
-MinLenA 5000000 \
-MinLenB 5000000 \
-NumThreads 6 \
-OutPrefix refA2refB

The intermediate script (refA2refB.mapping.sh) that was generated is as follow:

/home/miniconda3/pkgs/minimap2-2.17-h5bf99c6_4/bin/minimap2   -x asm5  -t 6 refA2refB.B.fa  refA2refB.A.fa > refA2refB.paf 
perl  /home/software/NGenomeSyn/bin/GetTwoGenomeSyn.pl  Paf2Link   refA2refB.paf    5000   refA2refB.link  
/home/software/NGenomeSyn/bin/NGenomeSyn  -InConf   refA2refB.conf   -OutPut    refA2refB.svg 

It seems like something wrong in the second step when converting ".paf" to ".link", but I can't figure out how to fix it.

Any help or suggestion would be appreciated!

Thank you!

hewm2008 commented 11 months ago

A: If the species are far away, generally use gene protein alignment (blastp diamond and then use mcscan to find collinear blocks) B: If the species are close, use genome data analysis directly (generally, for genomes of different species of the same species, use minimap2 mummer to find collinear areas)

Your two species are too far apart. No collinear blocks can be found from DNA. It is recommended to use proteins to find collinear blocks first.

C: try change 5000---> 50 no filter the 5k collinear blocks . perl /home/software/NGenomeSyn/bin/GetTwoGenomeSyn.pl Paf2Link refA2refB.paf 5000 refA2refB.link
change perl /home/software/NGenomeSyn/bin/GetTwoGenomeSyn.pl Paf2Link refA2refB.paf 50 refA2refB.link

PerDoloremAdAstra commented 11 months ago

Thank you very much for the answer! It turned out to be short blocks (< 500) between the two genomes indeed.