shingocat / lrscaf

TGS scaffolding
45 stars 6 forks source link

error of The Edges could not be empty #3

Closed tiramisutes closed 5 years ago

tiramisutes commented 6 years ago

I input the sam (pbalign) and m4 (blasr) alignment file to lrscaf through the command line or XML both give me the stderr.

2018-08-01 08:40:06  [ main:246434 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:176)] - [ INFO ]  Valid Aligned Records: 0
2018-08-01 08:40:06  [ main:246435 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:177)] - [ INFO ]  Reading Aligned Records, erase time: 13657 ms
2018-08-01 08:40:06  [ main:246436 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:123)] - [ INFO ]  Repeat count: 0
2018-08-01 08:40:06  [ main:246437 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:125)] - [ INFO ]  Finding repeat, erase time: 1 ms
2018-08-01 08:40:06  [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:93)] - [ INFO ]  Valid Links Acount: 0
2018-08-01 08:40:06  [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:94)] - [ INFO ]  Building Link, erase time : 0 ms
2018-08-01 08:40:06  [ main:246472 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - [ ERROR ]  PathBuilder : The Edges could not be empty!
2018-08-01 08:40:06  [ main:246473 ] - [agis.ps.Main.main(Main.java:59)] - [ INFO ]  Ending...
2018-08-01 08:40:06  [ main:246473 ] - [agis.ps.Main.main(Main.java:61)] - [ INFO ]  Scaffolding erase time: 246 s.

and empty links.info and triadlinks.info file.

Any help is much appreciated. Thanks.

shingocat commented 6 years ago

tiramisutes,

LRScaf only supports BLASR and Minimap TGS mapper by far.
And LRScaf supports only one alignment file. Could you show the command-line parameters setting or the content of XML configure file? Thanks!

tiramisutes commented 6 years ago

This is my command line.

java -jar LRScaf-1.1.3.jar -c contigs.fa -a mapped_against_celera.m4 -t m4 -o output

And XML file.

<?xml version="1.0" encoding="UTF-8"?>
<scaffold>
<!--The input file for scaffolding, including contigs and aligned files (i.e. m5, m4 or mm file) -->
    <input>
        <contig>/public/home/contigs.fa</contig>
        <m4>/public/home/mapped_against_celera.m4</m4>
    </input>
    <!-- The output folder for scaffolding -->
    <output>/public/home/output/</output>
    <!-- The parameters for scaffolding-->
    <paras>
         <!--More details are showed in README.md-->
         <min_contig_length>500</min_contig_length>
         <identity>0.8</identity>
         <min_overlap_length>400</min_overlap_length>
         <min_overlap_ratio>0.8</min_overlap_ratio>
         <max_overhang_length>1000</max_overhang_length>
         <max_overhang_ratio>0.1</max_overhang_ratio>
         <max_end_length>1000</max_end_length>
         <max_end_ratio>0.1</max_end_ratio>
         <min_supported_links>2</min_supported_links>
         <tips_length>1500</tips_length>
         <ratio>0.2</ratio>
         <repeat_mask>true</repeat_mask>
         <iqr_time>3</iqr_time>
         <mmcm>8</mmcm> <!--only for Minimap aligned outcome.-->
    </paras>
</scaffold>

But the pbalign is used blasr do alignment and samtools to sort.

shingocat commented 6 years ago

tiramisutes,

The setting for parameters is right. Could you afford a few lines of your m4 alignment file?

Yes, pbalign uses BLASR as its mapper. However, LRScaf parses some optional fields of SAM format for BLASR. I am not sure that pbalign change it or not, since I do not test pbalign SAM format.

tiramisutes commented 6 years ago

Following is the m4 file in my command.

m160724_185407_42278_c100905242550000001823198804291693_s1_X0/24/2147_2757 scaffold32731cov92|ref0458769|ref0560546|ref0196544|ref0025777 -731 73.9414 0 361 610 610 1 1051
9 10808 29179 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/24/2809_4502 scaffold72534cov92|ref0374539|ref0061562|ref0296692|ref0690163 -2158 85.9272 0 528 1093 1693 0 5
198 5768 11909 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/30/23160_27047 scaffold5244cov146|ref0445811|ref0083241|ref0464038|ref0316990 -14645 86.9467 0 8 3887 3887 1 
4608 8207 67931 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/26/14319_21312 scaffold635147cov113|ref0252244|ref0334919|ref0162473|ref0013913 -11757 81.905 0 0 3609 6993 0
 34619 38134 47435 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/48/0_4017 scaffold34740cov135|ref0076209|ref0091986|ref0610944|ref0021485 -1969 77.8393 0 490 1169 4017 1 104
87 11133 13116 0
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/17/31453_38406 scaffold365684cov135|ref0211335|ref0255721|ref0678418|ref0418293 -1276 80.7143 0 3585 3961 695
3 1 348 745 815 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/8/2774_9655 scaffold40912cov156|ref0396370|ref0450535|ref0604322|ref0405472 -6684 82.3359 0 69 2113 6881 0 10
0248 101998 117231 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/19/14319_20246 scaffold11420cov72|ref0631538|ref0037478|ref0533511|ref0064302 -6611 82.8993 0 0 1952 5927 1 9
39 2793 7857 0
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/48/4073_7925 scaffold81042cov104|ref0673925|ref0470646|ref0045294|ref0207180 -2813 70.8029 0 10 1352 3852 1 7
539 8574 8582 254
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/23/16576_23617 scaffold25535cov99|ref0233567|ref0012226|ref0367927|ref0430472 -7682 83.7674 0 2625 4842 7041 
1 578 2693 2695 0
m160724_185407_42278_c100905242550000001823198804291693_s1_X0/48/39184_40626 scaffold26361cov71|ref0012983|ref0403136|ref0572297|ref0509465 -1941 88.5602 0 947 1442 1442 1
 2185 2660 4611 254
shingocat commented 6 years ago

tiramisutes,

According to the few lines of your alignment file, I found that the different long read only mapped to one draft assembly sequence (1 to 1). A valid aligned record of LRScaf means that a long read could map (link) different contigs or scaffolds sequences. After LRScaf read and checked your alignment file, It did not find any useful alignment information to link draft assembly. So that is why you got the output of Valid Aligned Records: 0. Pls, check whether all the alignment record is 1 to 1 in your m4 format alignment file. The following records are some of alignment in m4 format in one of my genome project. You could find that the PB50 long read mapped (linked) different contigs. PB50 1357_length_30835_cvg_59.9_tip_0|ref0000391|ref0000474|ref0000621|ref0000254|ref0000232|ref0000367|ref0000577|ref0000662|ref0000040|ref0000668|ref0000568|ref0000635|ref0000290|ref0000518|ref0000218|ref0000480|ref0000434|ref0000250|ref0000204|ref0000044|ref0000608|ref0000069|ref0000121|ref0000105|ref0000695|ref0000653|ref0000207|ref0000372 -15363 86.5243 0 2999 7128 7136 1 32 3797 30835 254 PB50 1015_length_728_cvg_63.0_tip_0|ref0000579|ref0000302|ref0000611|ref0000013|ref0000033|ref0000629|ref0000693|ref0000519|ref0000612|ref0000377|ref0000216|ref0000325|ref0000427|ref0000655|ref0000706|ref0000257|ref0000412|ref0000453|ref0000047|ref0000161|ref0000696|ref0000139|ref0000628|ref0000087|ref0000639|ref0000236|ref0000593|ref0000094 -2452 77.1872 0 1540 2415 7136 1 0 728 728 254 PB50 993_length_518_cvg_71.0_tip_0|ref0000438|ref0000165|ref0000590|ref0000021|ref0000688|ref0000334|ref0000519|ref0000352|ref0000113|ref0000596|ref0000600|ref0000295|ref0000477|ref0000291|ref0000122|ref0000590|ref0000001|ref0000140|ref0000432|ref0000461|ref0000418|ref0000249|ref0000454|ref0000272|ref0000497|ref0000023|ref0000463|ref0000457 -2227 90.9926 0 981 1505 7136 0 0 518 518 254 PB50 933_length_283_cvg_64.0_tip_0|ref0000456|ref0000269|ref0000544|ref0000088|ref0000323|ref0000013|ref0000367|ref0000018|ref0000282|ref0000614|ref0000287|ref0000591|ref0000658|ref0000673|ref0000449|ref0000405|ref0000495|ref0000085|ref0000321|ref0000182|ref0000726|ref0000614|ref0000651|ref0000043|ref0000549|ref0000355|ref0000269|ref0000370 -1099 90.146 0 365 635 7136 1 19 271 283 254 PB31 1337_length_26484_cvg_61.4_tip_1|ref0000009|ref0000337|ref0000220|ref0000574|ref0000114|ref0000622|ref0000625|ref0000456|ref0000641|ref0000189|ref0000593|ref0000606|ref0000583|ref0000139|ref0000074|ref0000075|ref0000303|ref0000225|ref0000598|ref0000114|ref0000171|ref0000289|ref0000479|ref0000664|ref0000567|ref0000053|ref0000586|ref0000307 -18257 89.9433 0 1975 6377 6385 1 0 4358 26484 254 PB31 1271_length_16383_cvg_62.0_tip_1|ref0000598|ref0000383|ref0000542|ref0000313|ref0000720|ref0000044|ref0000382|ref0000673|ref0000038|ref0000467|ref0000601|ref0000280|ref0000245|ref0000510|ref0000540|ref0000684|ref0000710|ref0000190|ref0000431|ref0000039|ref0000211|ref0000354|ref0000707|ref0000461|ref0000382|ref0000358|ref0000260|ref0000136 -7342 87.304 0 105 1983 6385 1 14525 16383 16383 254 PB59 1455_length_164235_cvg_60.7_tip_0|ref0000024|ref0000402|ref0000493|ref0000119|ref0000616|ref0000654|ref0000196|ref0000718|ref0000078|ref0000309|ref0000725|ref0000414|ref0000714|ref0000303|ref0000724|ref0000476|ref0000235|ref0000060|ref0000184|ref0000702|ref0000674|ref0000060|ref0000323|ref0000437|ref0000045|ref0000640|ref0000379|ref0000138 -19941 86.005 0 110 5606 5606 1 154754 159634 164235 254 PB72 1273_length_16774_cvg_61.0_tip_1|ref0000256|ref0000423|ref0000206|ref0000266|ref0000407|ref0000167|ref0000449|ref0000716|ref0000660|ref0000431|ref0000342|ref0000141|ref0000097|ref0000252|ref0000613|ref0000089|ref0000648|ref0000247|ref0000127|ref0000135|ref0000343|ref0000495|ref0000023|ref0000459|ref0000002|ref0000685|ref0000453|ref0000121 -3202 82.2759 0 254 1231 1322 1 6029 6865 16774 254

tiramisutes commented 6 years ago

It seems that all the alignment record is 1 to 1 in my m4 format alignment file. Follows is my blasr command and is thers some mistake?

blasr ./PacBioBam/AS_Set.xml genome.fa --out ./LRSCAF/PacBio.m4  -m 4  --bestn 10 --minMatch 12  --maxMatch 30  --nproc 20  --minSubreadLength 50 --minAlnLength 50  --minPctSimilarity 70 --minPctAccuracy 70 --hitPolicy randombest  --randomSeed 1

Thanks.

shingocat commented 6 years ago

tiramisutes,

You could try to set the parameter "--hitPolicy randombest" to "--hitPolicy all". LRScaf needs all the alignments information. If you only output the randombest, LRScaf could not build the links information between contigs. Or you could only set the parameters of mandatory, i.e. >blasr ./PacBioBam/AS_Set.xml genome.fa --out ./LRSCAF/PacBio.m4 -m 4

sanjitsbatra commented 6 years ago

Hi! I also faced the same error but in my case I had a paf file created by minimap2, run with "-x map-ont" flag. The run's output says "Valid Aligned Records: 0". Any suggestions on why that might be happening?

YiweiNiu commented 6 years ago

Hi! I also faced the same error and I had a paf file created by minimap2-2.12. Here are my commands and error messages:

minimap2-2.12/minimap2 -x map-pb -t 20 -I 500G contigs.fasta PB.fa > run1.paf
java -jar LRScaf-1.1.4.jar -c contigs.fasta -a run1.paf -t mm -o run1

# error messages of lrscaf
2018-08-20 09:37:36 [ INFO ]  Launching...
2018-08-20 09:37:36 [ INFO ]  The output folder was exist!
2018-08-20 09:37:36 [ INFO ]  It will delete all file under this folder!
2018-08-20 09:37:36 [ INFO ]  Building output folder, erase time: 2 ms
2018-08-20 09:38:06 [ INFO ]  Reading contigs, erase times: 30507 ms
2018-08-20 09:41:37 [ INFO ]  Valid Aligned Records: 14839
2018-08-20 09:41:37 [ INFO ]  Reading Aligned Records, erase time: 210749 ms
2018-08-20 09:41:37 [ INFO ]  Finding Repeats:
2018-08-20 09:41:37 [ INFO ]  MIN: 1.0
2018-08-20 09:41:37 [ INFO ]  First Quartile: 1.0
2018-08-20 09:41:37 [ INFO ]  Median cov = 3.0
2018-08-20 09:41:37 [ INFO ]  Third Quartile: 7.0
2018-08-20 09:41:37 [ INFO ]  MAX: 153368.0
2018-08-20 09:41:37 [ INFO ]  Interquartile Range: 6.0
2018-08-20 09:41:37 [ INFO ]  1.5's IQR , Outlier Threshold: 16.0
2018-08-20 09:41:37 [ INFO ]  Repeat count: 491
2018-08-20 09:41:37 [ INFO ]  Finding repeat, erase time: 28 ms
2018-08-20 09:41:37 [ INFO ]  Building Links in 00%
2018-08-20 09:41:37 [ INFO ]  Building Links in 10%
2018-08-20 09:41:37 [ INFO ]  Building Links in 20%
2018-08-20 09:41:37 [ INFO ]  Building Links in 30%
2018-08-20 09:41:37 [ INFO ]  Building Links in 40%
2018-08-20 09:41:37 [ INFO ]  Building Links in 50%
2018-08-20 09:41:37 [ INFO ]  Building Links in 60%
2018-08-20 09:41:37 [ INFO ]  Building Links in 70%
2018-08-20 09:41:37 [ INFO ]  Building Links in 80%
2018-08-20 09:41:37 [ INFO ]  Building Links in 90%
2018-08-20 09:41:37 [ INFO ]  Building Links in 100%
2018-08-20 09:41:37 [ INFO ]  Valid Links Acount: 0
2018-08-20 09:41:37 [ INFO ]  Building Link, erase time : 134 ms
2018-08-20 09:41:37 [ ERROR ]  PathBuilder : The Edges could not be empty!
2018-08-20 09:41:37 [ INFO ]  Ending...
2018-08-20 09:41:37 [ INFO ]  Scaffolding erase time: 241 s.
shingocat commented 6 years ago

YiweiNiu,

If you use the minimap, you should set the identity value. We have updated our LRScaf <version 1.1.5> to automatically set the identity to be 0.1 for minimap. Or If you use LRScaf <Version 1.1.4>, you could add the identity parameter, the run command looks like: java -jar LRScaf-1.1.4.jar -c contigs.fasta -a run1.paf -t mm -i 0.1 -o run1

YiweiNiu commented 6 years ago

Thank you! It works now.